


DUE DATE SLIP 

TEXT BOOK 

Cl. No. 

Ac. No. q c; A 

This book should be returned on or before the date last 
stamped below. An overdue charge of 25 Paise per day 
will be charged for the first two days and 50 Paise from 
the third day the book is kept overtime. 





INTRODUCTION TO 
MATHEMATICAL PROBABILITY 




Introduction to 

MATHEMATICA L 
PROBABILITY 


J. V. Uspensky 

Late professor of mathematics, Stanford University 


McGRAW-HILL BOOK COMPANY, Inc. 
New York Toronto London 



Copyright, 1937 , by the 
McGraw-Hili. Book Company, Inc. 
"Copyright renewed 1965 by Lucille Zander Uspensky.” 

PRINTED IN THE UNITED STATES OF AMERICA 

All rights reserved. This book, or 
parts thereof, may not he reproduced 
in any form without permission of 
the publishers 

4 5 6 7 8 9 0 MPC 75 74 73 72 71 70 69 68 67 66 



PREFACE 


This book is an outgrowth of lectures on the theory of probability 
which the author has given at Stanford University for a number of 
years. At first a short mimeographed text covering only the elementary 
parts of the subject was used for the guidance of students. As time 
went on and the scope of the course was gradually enlarged, the necessity 
of putting into the hands of students a more elaborate exposition 
of the most important parts of the theory 6f probability. Accordingly 
rather large manuscript was prepared for this purpose. The author 
did not plan at first to publish it, but students and other persons who had 
opportunity to {icruse the manuscript were so persuasive that publication 
was finally arranged. 

The book is arranged in such a way that the first part of it, consisting 
of Chapters 1 to XII inclusive, is accessible to a person without advanced 
mathematical knowledge. Chapters VII and VIII are, perhaps, excep¬ 
tions. The analysis in Chapter VII is rather involved and a better way 
to arrive at the same results would be very desirable. At any rate, a 
reader who does not have time or inclination to go through all the 
ii^ricacies of this analysds may skip it and retain only the final results, 
found in Section 11. Chapter YllI, though dealing with interesting 
taid historically important problems, is not important in itself and may 
Jiihout loss be omitted by readers. Chapters XJII to XVI incorporate 
iWilts of modem investigations. Naturally they are more complex 
I require more mature mathematical preparation. 

Three appendices are added to the book. Of these the second is by 
far the most important. It gives an outline of the famous T 8 heb 3 rsheff- 
Markoff method of moments applied to the pr€x>f of the fundamental 
theorem previously established by another method in Chapter XIV. 

No one will dispute Newton’s assertion: “In scientiis addiscendis 
exempla ma g is prosunt quam praecepta.” But especially is it so in the 
theory of probability. Accordingly, not only are a large number of 
illustrative problems discussed in the text, but at the end of each chapter 
\ a adection of problems is added for the benefit of students. Some of 
'Jfchem are mere examples. Others are more difficult problems, or even 
hl^rtant theorems which did not find a place in Jhe main text. In all 
^(^h cases sufficiently explicit indications of solution (or proofs) are given. 
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FREFACE 


The book does not go into applications of probability to other sciences. 
To present these applications adequately another volume of perhaps 
larger size would be required. 

No one is more aware than the author of the many imperfections in 
the plan of this book and its execution. To present an entirely satis¬ 
factory book on probability is, indeed, a difficult task. But even with 
all these imperfections we hope that the book will prove useful, especially 
since it contains much material not to be found in other books on the 
same subject in the English language. 

J. V. Uspensky. 

Stanford University, 

SeptembeTf 1937- 
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INTRODUCTION TO 
MATHEMATICAL PROBABILITY 

INTRODUCTION 

Qyxinto enim minus rcUionis terminis comprehendi posse 
videhatur, quae fortuita sunt atque incertUf tanto admira- 
bUior ars censehituTf cui ista quoque subjacent .— 

Chr. Huygens, 

De Toliociniis in ludo aleae. 

1. It is always difficult to describe with adequate conciseness and 
clarity the object of any particular science; its methods, problems, and 
results are revealed only gradually. But if one must define the scope 
of the theory of probability the answer may be thisr vThe theory of I 
probability is a branch of applied mathematics dealing with the effects of 1 
chance. Here we encounter the word ''chance,which is often used in 
everyday language but with rather indefinite meaning. To make clearer 
the idea conveyed by this word, we shall try first to clarify the opposite 
idea expressed in the word necessity.^' Necessity may be logical or 
physical. The statement ^^The sum of the angles in a triangle is equal 
two right angles^* is a logical necessity, provided we assume the 
axioms of Euclidean geometry; for in denying the conclusion of the 
admitted premises, we violate the logical law of contradiction. ^ 

The following statements serve to illustrate the idea of physical 
necessity: 

A piece of- iron falls down if not supported. 

Water boils if heated to a sufficiently high temperature. 

A die thrown on a board never stands on its edge. 

, The logical structure of all these statements is the same: When certain 
{conditions which may be termed ‘‘causes^^ are fulfilled, a definite effect 
Occurs of necessity. But the nature of this kind of necessity is different 
from that of logical necessity. The latter, with our organization of 
mind, appears absolute, while physical necessity is only a result of 
extensive induction. We have never known an instance in which water, 
heated to a high temperature, did not boil; or a piece of iron did not fall 
down; or a die stood on its edge. For that reason we are led to believe 
that in the preceding examples (and in innumerable similar instances) 
the effect follows from its *'cause'' of necessity. 

1 
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Instead of the term physical necessity’’ we may introduce the 
abstract idea of ''natural law.” Thus, it is a "natural law” that the 
piece of iron left without support will fall down. Natural laws derived 
from extensive experiments or observations may be called "empirical 
laws” to distinguish them from theoretical laws. In all exact sciences 
which have reached a high degree of development, such as astronomy, 
physics, and chemistry, scientists endeavor to build up an abstract and 
simplified image of the infinitely complex physical world—an image 
which can be d€?scribed in mathematical terms. With the help of 
hypotheses and some artificial concepts, it becomes possible to derive 
mathematically certain laws which, when applied to the world of reality, 
represent many natural phenomena with an amazing degree of accuracy. 
It is true that in the development of the sciences it sometimes becomes 
necessary to recast the previously accepted image of the physical world, 
but it is remarkable that the fundamental theoretical laws even then 
undergo but slight modification in substance or interpretation. 

The chief endeavor of the exact sciences is the discovery of natural 
laws, and their formulation is of the greatest importance to the promotion 
of human knowledge in general and to the extension of our powers over 
natural phenomena. 

Are the events caused by natural laws absolutely certain? No, 
but for all practical purposes they may be considered as certain. It is 
possible that one or another of the natural laws may fail, but such 
failure would constitute a real "miracle.” However, granted that the 
possibility of miracles is consistent with the nature of scientific knowledge, 
actually this possibility may be disregarded. • 

2. If the preceding explanations throw a faint light upon the concept 
of necessity, it now remains to illuminate by comparison some charac¬ 
teristic features inherent in the concept of " chance.” To say that chance 
is a denial of necessity is too vague a statement, but examples may help 
us to understand it better. 

If a die is thrown upon a board we are certain that one of the six faces 
will turn up. But whether a particvlar face will show depends on what 
we call chance and cannot be predicted. Now, in the act of tossing a 
die there are some conditions known to us: first, that it is nearly cubic 
in shape; further, if it is a good die, its material is as nearly as posmble 
homogeneous. Besides these known conditions, there are other factors 
influencing the motion of the die which are completely inaccessible to our 
knowledge. First among them are the initial position and the impulse 
impartc^l by the player’s hand. These depend on an "act of will”—an 
agent which may act without any recognisable motivation—and therefore 
they are outside the domidn of rational knowlcnlge. Second, supposing 
the initial conditions known, the complexity of the resulting motion 
defies any possibility of foreseeing the final result. 
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Another example: If equal numbers of white and black balls, which do 
not differ in any respect except in color, are concealed in an um, and we 
draw one of them blindly, it is certain that its color will be either white 
or black, but whether it will be black or white we cannot predict: that 
depends on chance. In this example we again have a set of known 
conditions: namely, that balls in equal numbers are white and black, and 
that they are not distinguishable except in color. But the final result 
depends on other conditions completely outside our knowledge. First, 
we know nothing about the respective positions of the white and black 
balls; second, the choice of one or the other depends on an act of will. 

It is an observed fact that the numl>ers of marriages, divorces; births, 
deaths, suicides, etc., per 1,000 of population, in a country with nearly 
settled living conditions and during not too long a period of time, do not 
remain constant, but oscillate within comparatively narrow limits. For 
a given year it is impossible to predict what will be their numbers: that 
depends on chance. For, besides some known conditions, such as the 
level of prosperity, sanitation, and many other things, there are unnum¬ 
bered factors completely outside our knowledge. 

Many other examples of a similar kind can be cited to illustrate the 
notion of chance. They all possess a common logical structure which 
can be described as follows: an event A may materialize under certain 
known or ‘‘fixed’' conditions, but not necessarily; for under the same fixed 
conditions other events B, C, D, , . . are also possible. The mate¬ 
rialization of A depends also ufion other factors completely outside our 
control and knowledge. Consequently, whether A will materialize or 
not under such circumstances cannot be foreseen; the materialization of 
A is due to chance, or, to express it concisely, A is a contingent event. 

3. The idea of necessity is closely related to that of certainty. Thus 
it is “certain” that everybody will die in the due course of time. In 
the same way the idea of chance is related to that of 'probability or likeli- 
hood. In everyday language, the words “probability” and “probable” 
are used with different shades of meaning. By saying, “Probably it will 
rain tomorrow,” we mean that there are more signs indicating rainy 
weather than fair for tomorrow. On the other hand, in the statement, 
“There is little probability in the story he told us,” the word “proba¬ 
bility” is used in the sense of credibility. But henceforth we shall use 
the word as equivalent to the degree of credence which we may place 
in the possibility that some contingent event may materialize. The 
“degree of credence” implies an almost instinctive desire to compare 
probabilities of different events or facts. That such comparison is 
possible one can gather from the following examples: 

I live on the second floor and can reach the ground either by using 
the stairway or by jumping from the window. Either way I might be 
injured, though not necessarily. How do the probabilities of being 



4 


INTRODUCTION TO MATHEMATICAL PROBABILITY 


injured compare in the two cases? Everyone, no doubt, will say that 
the probability of being injured by jumping from the window is **greater^^ 
than the probability of being injured while walking down the stairway. 
Such universal agreement might be due either to personal experience or 
merely to hearsay about similar experiences of other persons. 

An urn contains an equal number of white and black balls that are 
similar in all respects except color. One ball is drawn. It may be either 
black or white. How do the probabilities of these two cases compare? 
One almost instinctively answers: “They are equal.^' 

Now, if there are 10 white balls and 1 black ball in the urn, what 
about the probabilities of drawing a white or a black ball? Again one 
would say without hesitation that the probability of drawing a white ball 
is greater than that of drawing a black ball. 

Thus, probability appears to be something which admits of compari¬ 
sons in magnitude, ^but so far only in the same way as in the intensity of 
pain produced by piercing the skin with needles. 

But it is a noteworthy observation that men instinctively try to 
characterize probabilities numerically in a naive and unscientific manner. 
We read regularly in the sporting sections of newspapers, predictions 
that in a coming race a certain horse has two chances against one to 
win over another horse, or that the chances of two football teams are as 
10 to 7, etc. No doubt experts do know much about the respective 
horses and their riders, or the comparative strengths of two competing 
football teams, but their numerical estimates of chances have no other 
merit than to show the human tendency to assign numerical values to 
probabilities which most likely cannot be expressed in numbers. 

It is possible that a man endowed with good common sense and ripe 
judgment can weigh all available evidence in order to compare the 
probabilities of the various possible outcomes and to direct his actions 
accordingly so as to secure profit for himself or for society. But precise 
conclusions can never be attained unless we find a satisfactory way to 
represent or to measure probabilities by numbers, at least in some cases. 

4. As in other fields of knowledge, in attempting to measure proba¬ 
bilities by numbers, we encounter difficulties that cannot be avoided 
except by making certain ideal assumptions and agreements/* In 
geometry (we speak of applied and not of abstract geometry), before 
explaining how lengths of rectilinear segments can be measured, we must 
first agree on criteria of equality of two segments. Similarly, in dealing 
with probability, the first step is to answer the Question: When may two 
contingent events be considered as equally probable or, to use a more 
common expression, equally likelyf From the statements of Jacob 
Bernoulli, one of the foui&ders of the mathematical theory of probability, 
one can infer the following criterion of equal probability: 
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Two contingent events are considered as equally probable ify after taking 
into consideration all relevant evidence, one of them cannot be expected in 
preference to the other. 

Certainly there is some obscurity in this criterion, but it is hardly 
possible to substitute any better one. To be perfectly honest, we must 
admit that there is an unavoidable obscurity in the principles of all the 
sciences in which mathematical analysis is applied to reality. 

The application of Bernoulli's criterion to particular cases is beset 
with difficulties and requires good common sense and keen judgment. 
There is much truth in Laplace’s statement: “La thdorie des probabilit^c 
n’est ail fond que le bon sens r^duit au calcul.” 

To elucidate the nature of these difficulties, let us consider an urn 
filled with white and black balls, but in unknown proportion. The only 
evidence we have, namely, that there are both white and black balls in 
the urn, in this case appears insufficient for any conclusion about the 
respective probabilities of drawing a white or a black ball. We instinc¬ 
tively think of the numbers of the two kinds of balls, and, being in 
ignorance on this point, we are inclined to suspend judgment. But if we 
know that white and black balls are equal in number and distributed 
without any sort of regularity, this knowledge appears sufficient to 
assume the equality of the probabilities of drawing a white or a black 
ball. It is possible that, perhaps unconsciously, we are influenced by the 
commonly known fact that if we repeatedly draw a ball out of the urn 
many times, returning the ball each time before drawing again, the white 
and the black balls appear in nearly equal numbers. 

If an urn contains a certain number of identical balls distinguished 
from one another by some characteristic signs, for example, by the 
numbers 1, 2, 3, . . . , the knowledge that the balls are identical and 
are distributed without regularity suffices in this case to cause us to 
conclude that the probabilities for drawing any of the balls should be 
considered as equal. Again, in so readily assuming this conclusion we 
may be influenced by the fact empirically observed (by ourselves or by 
others) that in a long series of drawings, with balls being restored to 
the urn after each withdrawal, the balls appear with nearly the same 
frequency. 

An ordinary die is tossed. Should we consider the possible numbers 
of points 1, 2, 3, 4, 5, 6 as equally probable? To pronounce any judg¬ 
ment, we must know something about the die. If it is known that the 
die has a regular cubic shape and that its material is homogeneous, we 
readily agree on the equal probabilities of all the numbers of points 
1, 2, 3, 4, 5, 6. And this a priori conclusion, based on Bernoulli’s cri¬ 
terion, agrees with the observed fact that each number of points does 
appear nearly an equal number of times in a long series of throws, if the 
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die is a good one. However, if we only know that the die has a regular 
shape, but not whether or not it is loaded, it is only sensible to suspend 
judgment. 

These examples show that before trying to apply Bernoulli’s criterion, 
we must have at our disposal some evidence the amount of which cannot 
be determined by any general rules. It may be also that the reason a 
priori must be supplemented by some empirical evidence. In some 
cases, lacking sufficient grounds to assert equal probabilities for two 
events, we may assume them as a hypothesis, to be kept until for some 
reason we are forced to abandon it. 

5. Besides the ticklish question: When are we entitled to consider 
events as equally probable? there is another fundamental assumption 
required to make possible the measurement of probabilities by numbers. 

Events Ui, at, . . . a» form an exhaustive set of possibilities under 
certain fixed conditions S, if at least one of them must necessarily mate* 
rialize. They are mutually exclusive if any two of them cannot material¬ 
ize simultaneously. The fundamental assumption referred to consists in 
the possibility of subdividing results consistent with the conditions S 
into a number of exhaustive, mutually exclusive, and equally likely 
events, or cases (as they are commonly called): 

ui, oi, . . . a». 

This being granted, the probability of any one of these cases is assumed 
to be 1/n. 

An event A may materialize in several mutually exclusive particular 
forms: a, p, . . . X; that is, if A occurs,' then one and only one of the 
events a, X occurs also, and conversely the occurrence of one of 

these events necessitates the occurrence of A. Thus, if A consists in 
drawing an ace from a deck of cards, A may materialize in four mutually 
exclusive forms: as an ace of hearts, diamonds, clubs, or spades. 

Let an event A be represented by its particular forms Ui, at, . . . a«, 
which together with other events a«i+i, a«H.t, . . . a» constitute an 
exhaustive set of mutually exclusive and equally likely cases consistent with 
the conditions iS. Event8ai,at, . . . a^ are called*'cases favorable to A. 

Definition of Mathematical Probability. //, consistent with conditions' 
S, there are n exhaustive, mutually exclusive, and equally Ukely cases, and\ 
m of them are favorable to an event A, then the malhemaitcal probability oy 
A is defined as the ratio m/n. 

In drawing a card from a full deck there are 52 and no more mutually 
exclusive and equally likely cases; 4 of them are favorable for drawing an 
ace; hence the probability of drawing an ace is %2 == Ks- 

From an um containqig 10 white, 20 black, and 5 red balls, one ball is 
drawn. Here, distinguishing individual balls, we have 35 equally likely 
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cases. Among them there are 10, 20 , and 5 cases, favorable rcs])ectively 
to a white, a black,^or.a red ball. Hence the probabilities of drawing a 
white, a black, or a red ball are, respectively, and 

In the first example, instead of 52 cases, we may consider only 13 
cases according to the denominations of the pards. These cases being' 
regarded as equally likely, there is only one of them favorable to an 
ace. The probability of drawing an ace is 3 ^ 3 . This observation makes 
it clear that the subdivision of all possible results into equally likely 
cases can be done in various ways. To avoid contradictory estimations 
of the same probability we must always observe the following rules: 

Two events are equally likely if each of them can be represented by 
equal numbers of equally likely forms. ^ 

Two events are not equally likely if they are represented by unequal 
numbers of equally likely forms. 

Thus, if two equally likely events are each represented by different 
numbers of their respective forms, then the latter cannot be considered as 
equally likely. 

Each card is characterized by its denomination and the suit to which 
it belongs. Noting denominations, we distinguish 13 cases, but each 
of these is represented by 4 new cases according to the suit to which the 
card belongs. Altogether we have, then, 52 cases recognized as equally 
likely; hence, the above-mentioned 13 cases should be considered as 
equally likely. 

In connection with the definition of mathematical probability, 
mention should be made of an important principle not always explicitly 

stated. If 

«» 

hi, ... hp 

are all mutually exclusive and equally likely cases consistent with 
certain conditions, and the indication of the occurrence of an event B 
makes cases 61 , hi, . . . hp impossible, cases Oi, Oi, . . . a«» still should be 
considered as equally likely. To illustrate this principle, consider an 
um with six tickets bearing numbers 1 , 2, ... 6 . Two tickets are 
drawn in succession. If nothing is known about the number of the first 
ticket, we still have six possibilities for the number of the second ticket, 
which we agre^ to consider as equally likely. But as soon as the number 
of the first ticket becomes known, then there are only five cases left 
concerning the number of the second ticket. According to the above 
principle we must consider these five cases as equally likely. 

6 . Probability as defined above is represented by a number contained 
between 0 and 1. In the extreme case in which the probability is 0, it 
indicates the impossibility of an event. On the contrary, in the other 
extreme case in which the proba|)ility is 1 , the event is certain. When 
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the probability is expressed by a number very near to 1, it means that 
the overwhelming majority of cases are favorable to the event. On the 
contrary, a probability near to 0 shows that the proportion of favorable 
cases is small. 

From our experience we know that events with a small probabil¬ 
ity seldom happen. For instance, if the probability of an event is 
1/1,000,000, the situation may be likened to the drawing of a white ball 
from an urn containing 999,999 black balls and a single white one. 
This white ball is practically lost among the majority of black balls, and 
for all practical purposes we may consider its extraction impossible. 
Similarly, the probability 999,999/1,000,000 may be considered, from a 
practical standpoint, as an indication of certainty. What limit for 
smallness of probability is to be set as an indication of practical impos¬ 
sibility? Evidently there is no general answer to this question. Every¬ 
thing depends on the risk we can face if, contrary to expectation, an 
event with a small probability should occur. Hence, the main problem 
of the theory of probability consists in finding cases in which the proba¬ 
bility is very small or very near to 1. Instead of saying, ^‘The proba¬ 
bility is very near to 1,^^ we shall say, great probability,^^ although, 
of course, the probability can never exceed 1. 

7. The definition of mathematical probability in Sec. 5 is essentially 
the classical definition proposed by Jacob Bernoulli and adopted by 
Laplace and almost all the important contributors to the theory of 
probability. But, since the middle of the nineteenth century (Cournot, 
John Stuart Mill, Venn), and especially in our days, the classical definition 
has been severely criticized. Several attempts have been made to rear 
up the edifice of the mathematical theory of probability on quite a 
different definition of mathematical probability. It does not enter into 
our plan to criticize these new definitions, but, in the opinion of the 
author, many of them are self-contradictory. Modern attempts to build 
up the theory of probability as an axiomatic science may be interesting 
in themselves as mental exercises; but from the standpoint of applica¬ 
tions the purely axiomatic science of probability would have no more 
value than, for example, would the axiomatic theory of elasticity. 

' The most serious objection to the classical definition is that it can 
be used only in very simple and comparatively unimportant cases like 
games of chance. This objection, stressed by von Misos, is in reality 
not a new one. It is one of the objections Leibnitz made against Jacob 
Bernoulli's views concerning the possibility of applications of the theory 
of probability to various important fields of human endeavor and not 
merely to games of chance. 

It is certainly tr^ie that the classical definition cannot be directly 
applied in many important cases. But is it the fault of the definition 
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or is it rather due to our ignorance of tiie innermost mechanisms which, 
apart from chance, contribute to the materialization or nonmaterializa¬ 
tion of contingent events? It seems that this is what Jacob Bernoulli 
meant in his reply to Leibnitz: 

Objiciunt primo, aliam esse rationem calculorum, aliam morborum aut muta- 
tionum aeris; illorum numerum determinatum esse, horum indeterminatuni et 
vagum. Ad quod respondeo, utrumque respectu cognitionis nostrae aequi poni 
incertum et indeterrninatum; sed quicquam in se et siia natiira tale esse, non 
magis a nobis posse concipi, quam concipi potest, idem simul ab Auctore naturae 
creatum esse et non creaturn: quaecumque enim Deus fecit, eo ipso dum fecit, 
etiam determinavit.^ 

8. A brilliant example of how the profound study of a subject finally 
makes it possible to apply the classical definition of mathematical 
probability is afforded in the fundamental laws of genetics (a science of 
comparatively recent origin, whose importance no one can deny), dis¬ 
covered by the Augustinian monk, Gregor Mendel (1822 -1884). During 
eight years MendeP conducted experimental work in crossing different 
varieties of the common pea plant with the purpose of investigating how 
pairs of contrasting characters were inherited. For the pea plant there 
are several pairs of such contrasting characters: round or wrinkled seeds, 
tallness or dwarfness, yellow or green pod color, etc. Let us concentrate 
our attention on a definite pair of contrasting characters, yellow or green 
pod color. Peas with green pod color always breed true. Also some 
peas with yellow color always breed true, while still others produce both 
varieties. True breeding pea plants constitute two pure races: A with 
yellow pod color and B with green pod color, while plants with yellow 
po^s not breeding true constitute a hybrid race, C. Crossing plants of 
the race A with those of the race B and planting the seeds, Mendel 
obtained a first generation Fi of hybrids. Letting plants of the first 
generation self-fertilize and again planting their seeds to produce the 
second generation F 2 , Mendel found that in this generation there were 
428 yellow pod plants and 152 green pod plants in the ratio 2.82:1. 
In regard to other contrasting characters the ratio of approximately 3:1 
was observed in all cases. Later experimental work only confirmed 
Menders results. Thus, combined experiments of Correns, Tschermak, 
and others gave among 195,477 individuals of F 2 , 146,802 yellow pod 
plants and 48,675 green pod plants, in the ratio 3.016:1. 

'To understand the beginning of this statement see the translation from “Ars 
oonjectandi” in Chap. VI, p. 105. 

* MendePs results were piiV)lished in 1865, but passed completely unnoticed until 
in about 1000 the same facts were rediscovered by DeVries, Correns, and Tschermak. 
Modern Renetics dates from about this time. 
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Mendel not only discovered such remarkable regularities, but also 
suggested a rational explanation of the observed ratio 3:1, which with 
some modifications is accepted even today. Bodies of plants and 
AniTfuda are built up. of enormous numbers of cells, among which the 
reproductive cells, or gametes, differ from the remaining "somatic'* 
cells in some important qualities. Cells arc not homogeneous, but 
possess a definite structure. In somatic cells there are found bodies, 
called chromosomes, whose number is even and the same for the same 
species. Ebcactly half of this number of chromosomes is found in repro¬ 
ductive cells. Chromosomes are supposed to be seats of hypothetical 
"genes,” which are considered as bearers of various heritable characters. 
A chromosome of one pure race A bearing a character a differs from the 
homologous chromosome of another pure race B bearing a contrasting 
character 6 in that they contain genes of different kinds. Since characters 
a and 6 are borne by definite chromosomes, the situation in regard to the 
two characters a and h is exactly the same as if gametes of both races 
contained just one chromosome. Let us represent them symbolically by 
O and 0. In the act of fertilisation a pair of paternal and maternal 
gametes conjugate and form a i^gote, which by division and growth 
produces all cells of the filial generation. Certain of these cells become 
the germ cells and are set apart for the formation, by a complicated 
process, of gametes, one half of which contain the chromosome of the 
paternal type and the other half that of the maternal type. 

According to this theory, in crossing two individuals belonging to 
races A and B, zygotes of the first generation Fi will bt of the type 
O—0, and will produce gametes, in equal numbers, of the types 0,'0. 
Now if two individuals of Fi (hybrids) are crossed (or one individual 
self-fertilized as in the cases of some plants), one paternal gamete con¬ 
jugates with one maternal, and for the resulting zygote there are four 
possibilities: 

0—0 0—0 0—0 0—0 

These possibilities may be considered as equally probable, whence 
the probabilities for an individual of the generation Ft to belong respec¬ 
tively to the races A, B, C are Similarly, one easily finds that 

in crossing an individual of the race A with one of the hybrid race C, 
the probabilities of the offspring belonging to A or C are both equal to 
It is easy now to offer a rational explanation of the Mendelian ratio 
3:1. In the case of pea plants, individuals of the race A and hybrids 
are not distinguishable in regard to the color of their pods. Hence the 
probability of the offspring of a hybrid plant having yellow pods is 
while for the offspring to have green pods the probability is 
When the generation Ft consists of a great many individuals, the theory 
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of probability shows that the ratio of the number of yellow pod plants to 
the number of green pod plants is not likely to differ much from the ratio 
3:1. In crossing plants of the race A with hybrids, the offspring, if 
numerous, will contain plants of race A or C, respectively, in a proportion 
which is not likely to differ much from 1:1. And this conclusion was 
experimentally verified by Mendel himself. 

9. If in the case of the Mendelian laws the profound study of the 
mechanism of heredity together i^ith hypothetical assumptions of the 
kind used in physics, chemistry, etc., paved the way for a rational 
explanation of observed phenomena on the basis of the theory of proba¬ 
bility, in many other important instances we are still unable to reach the 
same degree of scientific understanding. Stability of statistical ratios 
observed in many cases suggests the idea that they should be explained 
on the basis of probability. For instance, it has been observed that 
the ratio of human male and female birtl^s is nearly 51:50 for large 
samples, and this is largely independent of climatic conditions, racial 
differences, living conditions in different countries, etc. Although the 
factors determining sex are known, yet some complications not suffi¬ 
ciently cleared up prevent estimation of probabilities of male and female 
births. 

In all instances of the pronounced stability of statistical ratios we 
may believe that some day a way will be found to estimate probabilities 
in such cases. Therefore many applications of the theory of probability 
to important problems of other sciences are based on belief in the existence 
of the probabilities with which we are concerned. In other cases in 
which the theory of probability is used, we may have grave doubts 
as to whether this science is applied legitimately. The fact that many 
applications of probability are based on belief or faith should not dis¬ 
courage us; for it is better to do something, though it may be not quite 
reliable, than nothing. Only we must not be overconfident about the 
conclusions reached under such circumstances. 

After all, is not faith at the bottom of all scientific knowledge? 
Physicists speak of electrons, which never have been seen and are known 
only through their visible manifestations. Electrons are postulated 
just to coordinate into a coherent whole a large variety of observed 
phenomena. Is not this faith? It must be, for according to Paul 
(Hebrews, 11:1),“ Faith is the substance of things hoped for, the evidence 
of things not seen.” 

10. In concluding this introduction it remains to give a short account 
of the history of the theory of probability. Although ancient philoso¬ 
phers discusscxl at length the necessity and contingency of things, it 
seems that mathematical treatment of probability was not known to the 
ancients. Apart from casual remarks of Galileo concerning the correct 
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evaluation of chances in a game of dice, we find the true origin of the 
science of probability in the correspondence between two great men of 
the seventeenth century, Pascal (1623-1662) and Fermat (1601-1665). 
A French nobleman, Chevalier de M^r6, a man of ability and great 
experience in gambling, asked Pascal to explain some seeming contradic¬ 
tions between his theoretical reasoning and the observations gathered 
from gambling. Pascal solved this difficulty and attacked another 
problem proposed to him by de M6r6. On hearing from Pascal about 
these problems, Fermat became interested in them, and in their private 
correspondence these two great men laid the first foundations of the 
science of probability. Bertrand’s statement, ‘‘Les grands noms de 
Pascal et de Fermat d^corent le berceau de cette science” cannot be 
disputed. 

Huygens (162^1695), a great Dutch scientist, became acquainted 
with the contents of this correspondence and, spurred on by the new 
ideas, published in 1654 a first book on probability, ^^De ratiociniis in 
ludo aleae,” in which many interesting and rather difficult problems on 
probabilities in games of chance were solved. To him we owe the 
concept of ** mathematical expectation” so important in the modern 
theory of probability. 

Jacob Bernoulli (1654-1705) meditated on the subject of probability 
for about twenty years and prepared his great book, *^Ars conjectandi,” 
which, however, was not published until eight years after his death in 
1713, by his nephew, Nicholas Bernoulli. Bernoulli envisaged the 
subject from the most general point of view, and clearly foresaw a whole 
field of applications of the theory of probability outside of the narroy*/ 
circle of problems relating to games of chance. To him is due the 
discovery of one of the most important theorems known as ‘‘Bernoulli’s 
theorem.” 

The next great successor to Bernoulli is Abraham de Moivre (1667- 
1754), whose most important work on probability, “The Doctrine of 
Chances,” was first published in 1718 and twice reprinted in 1738 and 
in 1756. De Moivre does not contribute much to the principles, but this 
work is justly renowned for new and powerful methods for the solution 
of more difficult problems. Many important results, ordinarily attrib¬ 
uted to Laplace and Poisson, can be found in de Moivre’s book. 

Laplace (1749-1827), whose contributions to celestial mechanics 
assured him everlasting fame in the history of astronomy, was very 
much interested in the theory of probability from the very beginning of 
his scientific career. After writing several important memoirs on the 
subject, he finally published, in 1812, his great work “Th6orie analytique 
vies probabilit^s,” accompanied by a no less known popular exposition, 
“Essai philosophique sur les probabilit^s,” destined for the general 
educated public. Laplace’s work, on account of the multitude of new 
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ideas, new analytic methods, and new results, in all fairness should be 
regarded as one of the most outstanding contributions to mathematical 
literature. It exercised a great influence on later writers on probability 
in Europe, whose work chiefly consisted in elucidation and development 
of topics contained in Laplace^s book. 

Thus in p]uropcan countries further development of the theory of 
probability was somewhat retarded. But the subject took on important 
developments in the works of Russian mathematicians: Tshebysheff 
(1821-1894) and his former students, A. Markoff (1856-1922) and A. 
Liapounoff (1858-1918). Castelnuovo in his fine book “Calcolo delle 
probability,” rightly regards the contributions to the theory of probability 
due to Russian mathematicians as the most important since the time of 
Laplace. 

At the present time interest in the theory of probability is revived 
everywhere, but again the most outstanding recent contributions have 
been made in Russia, chiefly by three prominent mathematicians: S. 
Bernstein, A. Khintchine, and A. Kolmogoroff. 

In closing this introduction it seems proper to quote the closing 
words of the “Essai philosophique sur les probabilitds”: 

On voit par cet Essai, qiie la th^orie des probabilit^s n’ est au fond, que le bon 
sens r^duit au calcul: elle fait appr^cier avee exactitude, cc que les ^sprits justes 
sentent par une sorte d’instinct, sans qu’ils puissent soiivent s’en rendre compte. 
Elle ne laisse rien d’arbitraire dans le choix des opinions et des partis y prendre, 
toutes les fois que Ton peut, k son moyen, determiner le choix le plus avantageux. 
Par la, elle devient le supplement Ic plus heureux, y Tignorance ct k la faiblesse 
d# resprit humain. Si Ton considerc les methodes analytiques auxquelles cette 
theorie a donne naissance, la verite des principes qui lui servent de base, la 
Iq^ique fine et delicate qu’ exige leur emploi dans la solution des probiemes, les 
etablissements d’utilite publique qui s’appuient sur elle, et rextension qu’elle a 
recue et qu’elle peut regevoir encore, par son application aux questions les plus 
irnportantes de la Philosophic naturellc ct des sciences morales; si Ton observe 
ensuite, que dans les choses memes qui ne peuvent ctre soumisc au calcul, elle 
donne les aper^us les plus sfirs qui puissent nous guider dans nos jugements, 
et qu’elle apprend ^ se garantir des illusions qui souvent nous 4garent; on verra 
qu’il n’est point de science plus digne de nos meditations. 
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CHAPTER I 


COMPUTATION OF PROBABIUTIBS BT DIRECT 
ENUMERATION OF CASES 


1. The probability of an event can be found by direct application 
of the definition when it is possible to make a complete enumeration of 
all equally likely cases, as well as of those favorable to that event. Here 
we sh^l consider a few problems, beginning with the simplest, to illustrate 
this direct method of evaluating probabilities. 

Problem 1. Two dice are thrown. What is the probability of 
obtaining a total of 7 or 8 points? 

Soltitioii. Suppose we distinguish the dice by the numbers 1 and 2. 
There are 6 possible cases as to the number of points on the first die; 
and each of these cases can be accompanied by any of the 6 possible 
numbers of points on the second die. Hence, we can distinguiah alto¬ 
gether 6 X 6 » 36 different cases. Provided the dice are ideally regular 
in shape and perfectly homogeneous, we have good reason to consider 
these 36 cases as equally likely, and we shall so consider them. 

Next, let us find out how many cases are favorable to the total of 
7 points. This may happen only in the followini^ways: 


Fint Die 
1 
2 

3 

4 

5 

6 

likewise, for 8 pmnts: 

Fmt Die 
2 

3 

4 

5 

6 


Second Die 

6 ct 

5 

4 

3 « 

2 
1 


Second Die 
6 
6 
4 
8 
2 


That is, out of the total number of 36 cases there are 6 cases favorable 
to 7 pdnts and 5 cases favorable to 8 pmnts; hence, the probafaililgr of 
obtitfniiig7 points is %% and the probability cxf obtaliiing 8 points is 9is* 
S. Probleai 8. A coin is tossed three times in aueoession. What 
is the iwobafailitsr of obtaining 2 heads? What is the probaUlity of 
obtaining tails at least once? 

U 
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Problems for Solution 


V 


1. What is the probability of obtaining 9, 10 , 11 points with 3 dice? 

*^16. 2 %,^. 

^2. What is the probability of obtaining 2 heads‘and 2 tails when 4 coins are 
fmown? Ans. %, 

3. Two urns cont^n respectively 3 white, 7 red, 15 black balls, and 10 white, 

3 jF^d, 9 black balls. One ball is taken from each^n. What is the probability that" 

they both will be of the same color? ^ Ans. 

^ 4 . What is the probability that of 6 ^rds taken from a full pack, 3 will be bk k 
red. ^ Ans^, ^ 3 9151 ** 0:^2 ^pproxima\eI;f. 

Ten cards are taken from a full** pack. What* is the probability of finding 
among them (o) at least one ace; (6) at least two aces^^Ans. ^**^ 735 - 

.- 6 . The face cards-are reipoVed from a full pack^ Out of the 4(J remaining cafds, 

4 are drawn. What is the probability that they belong to different suits? 

Ans.. 

I (Si Under the same conditions, what is the probability that the 4 cards belong to, 
IdipTerent suits and different denominations^ •* Ans. * 0 ^ 139 / 

> vjt. Five cards are taken from a full pack. Find the probabilities (a) that they arc 
of different denominations; {byihsX 2 are of the same denomination and 3 sci^rcd', 
(e) that one pair is of one denomination and another pair of a di^rent denomination, 
and one odd; (d^hat 3 are of the same denomination and 2 scattered; (e) that 2 are 
of one denomination and 3 of another; jj) that 4 are of one denomination and 1 of 
another. 

Ans. (o) angles; ( 6 ) (f) (e) Hiss; if) Hiss^ 

^9. What is the probability thkt 5 tickets taken in succession in the French lottery 
will present an increasing or decreasing sequence of'numbers? , Ans. 

^10. What is the probability that among 5 tickets drawn in the French lottery there 
is at least one with a one-digit number? Ans. *®3*%10983 = 0.417. 

^ ... 11 . Twelve balls are distributed at random among three boxes. What is the 

. 66 . 2 “ 

Ansr—— « 0.2120. 

f - 8‘*, 

12. In Piob. 12 (page 22 )'what is the most probable number of balls in a specified 
box? •Ans. The probability 


probalS'ility that the first box will contain 3 balls? 


Ph 


iST* 


I 


is the greatest if the integer h is determined by the conditions 


N 


N 


13. Apply these considerations to the case of n * 200, N ^ 20. Ans. Since 
h ^ 10 the inequalities on page 23 give 


Pio < 


10»Y, 


10! \ 20/ V 40/ 

.iY”(..iY. 

20 / \ 20 / 


; To find an approximate value of 
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note that 


To 3 decimals 


(1 - 


_ ,~‘®®ra+2®+do!+ • • •] 


pio * 0.128. 


i 


14. Four different objects, 1, 2, 3, 4, are distributed at random on four places 

marked 1, 2,3,4. What is the probability that none of the objects occupies the place 
corresponding to its number? Ans. 

15. Two urns contain, respectively, 1 black and 2 white balls, and 2 black and 
1 white ball. One ball is transferred from the first urn into the second, after which a 
ball is drawn from the second um. What is the probability that it is white? 

^ Ans. ^2- 

Il6y What is the probability of getting 20 points with 6 dice? 

^ Ans. *^Hi 84 - 0.09047. 

17. An um contains a white and b black balls. Balls are drawn one by one until 
only those of the same color are left. What is the probability that they are white? 

Ans.-4T* 

0 + 6 p 

18. In an um there are n groups of p objects each. Objects in different groups are 

distinguished by some characteristic property. What is the probability that among J 
ai -H at + * * * + «« objects (0 ^ ^ p; t » 1, 2, . . . n) taken, there are ai ol 

one group, a» of another, etc.? Ans. Let X among the numbers ai, aj, . . . a» be 
equal to a, p be equal to 6, . . . «r be equal to 1. The required probability is 


nf 


X!/ftI 






<rl •••+«• 


Ptoblem 8 is a particular case of this. 

19. There are N tickets numbered 1,2, . . . AT of which n are taken at random and 
arranged in increasing order of their numbers; xi < Xt < • • • < Xn- What, is the 

/ym—l 


probability that Xm - it/? 


Ans. 





CHAPTER II 

THEOREMS OF TOTAL AND COMPOUND PROBABILITY 


1. As the problems become more complex the difficulties in enumerat¬ 
ing cases grow and often the computation of probabilities by direct 
application of definition becomes very inv^ilved. In many cases the 
complications can be avoided by use of two theorems which are funda¬ 
mental in the theory of probability. 

Before we can give a clear and exact statement of the first fundamental 
theorem, we must define what is meant by “mutually exclusive^ or 
“incompatible^ events. Events are called mutually exclusive pr 
incompatible if the occurrence of one of them precludes the occurrence 
of all the others. For instance, the four events concetning the number 
of points on two dice 

First Die Second Die 

"T ^ 

3 2 

4 1 

are mutually exclusive because it is evident that as soon as one of them 
occurs, none of the others can materialize. 

*On the contrary, events are compatible if it is possible for them to 
materialize simultaneously. For instance, the events of 5 points on one 
die*and 5 points on the other, are compatible, since in tossing two dice 
it is possible to get 5 points on each.^ 

To denote the probability of an event A, we shall use the symbol (A). 
To denote the probability of A or J5 (or both) we shall use the symbol 
(A + B). Dealing with several events A, B, . . . L, the symbol 

(A + B -f- * * * +L) 

will denote the probability of the occurrence of at least one of them. 
If A, B, . . . L are mutually exclusive events, this symbol represents 
the probability of the occurrence of one of them without specification as 
to which one. 

2. Now we shall state the first fundamental theorem, called the 
“theorem of total probability*' or “theorem of addition of probabilities,’* 
in the following way: 

^ theorem of Total Probability. The probability fo? one of the mutually 
exclusive events Ai, At, , , .Ante materialize, is the sum of the probabilities 

27 
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of these events. In symbolical notations, it is expressed thus: 

(ill + i42 + * • ‘ + ilfi) = (^i) 4” (^*) + * * • 4- (-4„). 

Proof. Let N be the number of all possible and equally likely cases 
out of which mi cases are favorable to the event ili, m 2 cases are favorable 
to the event A 2 , • . . ,andfinally, win cases are favorable to the event An. 
These cases are all different, since events A 1 , A 2 , ... An are incompati¬ 
ble. The number of cases favorable to either Ai or Ai, . . . or An is 
therefore 


wii 4- WI2 4- 


4- win. 


Hence, by definition 


wii 4“ W 2 4- 


(Ai4-A 24 - • • • 4-An) - 


Again, by definition of probability. 


N 


4- win _ Wli , WI 2 , 

^ N N 


4- 


, win 

N' 


•’"I f A \ . \ /A \ 


and so finally 

(Ai 4“ A 2 4* * * * 4“ An) = (Ai) 4“ (A 2 ) 4- * * * 4- (An), 
as stated. 

3. It is important to know that the same theorem, stated in a slightly 
different form, is especially useful in applications. An event A ran 
occur in several mutually exclusive forms, Ai, A 2 , . . . An, which may 
be considered as that many mutually exclusive events. Whenever A 
occurs, one of these events must occur, and conversely. Consequently, 
the probability of A is the same as the probability of one (unspecified) 
of its mutually exclusive forms. If, for instance, occurrence of 5 points 
on two dice is A, then this event occurs in 4 mutually exclusive forms, as 
tabulated above. 

From the new point of view, the theorem of total probability can be 
stated thus: 

Second Form of Theorem of Total Probability. The probability of 
an event A is the sum of the probabilities of its mutually exclusive forms 
Ai, A 2 , . . . An,* or, using symbols^ 

(A) = (Ai) 4" (A 2 ) 4- * * • 4 - (An). 

Probabilities (Ai), (A 2 ), . . . (An) are partial probabilities of incom¬ 
patible forms of A. • Since the probability A is their sum, it may be called 
a total probability of A. Hence the name of the theorem. 
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In the preceding example we saw that 5 points on two dice could be 
obtained in 4 mutually exclusive ways. Now the probability of any one 
of these ways is 3^6; hence, by the preceding theorem, the probability 
of obtaining 5 points with two dice is 

as it should be. 

If events Ai, A 2 , . . . A„ are not only mutually exclusive, but 
exhaustive,*’ which means that one of them must necessarily take place, 
the probability that one of them will happen is a certainty = 1, so that 
we must have 

(AO 4- (Aj) -f • • • + (An) 1. 

An event which is not certain, may or may not happen; this constitutes 
two mutually exclusive cases. It is customary to call nonoccurrence of a 
certain event A as the event opposite” to A, and we shall denote it 
by the 83anb61 A., Now A and A constitute two exhaustive and mutually 
exclusive casef. Hence, by the preceding remark 

(A) + (A) = 1, 

That is, if p is the probability of A 

g = 1 - p 

represents the probability that A will not occur. 

. 4. If an event A is considered in connection witn another event B, 
Oimcom'pound event AB consists in simultaneous occurrence of A and B. 
For three events A, B, C, the compound event ABC consists in simul¬ 
taneous occurrence' of A and B and C, and so on for any number of 
component events. We shall denote the probability of a compound 
event A B ... L by the symbol 

(AB . . . L). 

An event A can materialize in two mutually exclusive forms, namely, 
as A and B or A and B. Hence, by the theorem of total probability 

(^) = (AB) + (AB). 

Similarly 

(B) = (BA) + (BA), 

or, since the symbol (BA) does not depend upon the order of letters, 

(B) = (AB) + (AB). 

The sum (A) + (B) can be expressed as 

(d) + (B) = (AB) + [(^5) + (AB) + (dB)l,' 
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Again, by the theorem of total probabilities, the sum 
{AB) + {AB) 4- {AB) 

represents the probability (A -f* B) of the occurrence of at least one of 
the events A or B. The preceding equation leads to the useful formula 

(1) (A+B)^ (A) + (B) ~ (AB) . < 

which obviously is a generalization of the theorem of total probability; 
for (AB) = 0 if A and B are incompatible. Equation ( 1 ) can be used to 
derive an important inequality. Since (A + B) it follows from (1) 
that 

(AB) ^ (A) + (B) - 1. 

If B itself is a compound event A 1 A 2 , this inequality leads to 

(AA 1 A 2 ) ^ (A) + (AiA^) - 1. 

But 

(A,A 3 ) ^ (AO + (AO ~ 1, 

and so 

(AAiAO ^ (A) + (Ai) + (A 2 ) 

* * » * 

for three component events. Proceeding in the same manner, we can 
establish the following general inequality: * 

(AAiAj • • • An-i) ^ (A) + (Ai) + (A 2 ) 4" * • * 4" (An-i) — (n 1). 

Applying this inequality to events A, iCi, . . . iCn-i respectively 
opposite to A, Ai, . . . An-i, we get 

(iiiii ’ • • An-i) ^ (A) 4 - (-^i) 4“ * * • 4- (An-~i) — (n — 1), 

or, since (Ai) = 1 — (A<), 

(A) 4 - (Ai) 4" * • + (Af»_.i) ^ 1 — (AAi • • * A»-i). 

Now the compound event AAi . . . An-i means that neither A nor 
Ai, . . . nor An-i occurs. The event opposite to this is that at least 
one of the events A, Ai, . . . An-i occurs. Hence, 

1 — (AAi • • * Afi-i) = (A 4” Ai 4- * • * 4" A»-.i), 
and we reach the following important inequality: 

(A + Ai 4- • * • 4- A„^f) ^ (A) + (AO 4- • • * + (An^^ 

6 . Equation ( 1 ) can be extended to the case of more than two events. 
Let B mean the occurrence of at least one of the events A 1 or A 2 . Then 

by (1) 

(A 4- Ai 4- A 2 ) = (A) 4- (Ai 4- A 2 ) - (AB). 
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As to (Ai + -A2), its expression in given by ( 1 ). The compound event 
AB means the occurrence of one at least of the events AAi or A A 2. 
Hence, applying equation ( 1 ) once more, we find 

(AB) - (AAi + AA2) = (AAi) + (AA2) - (AA1A2) 

and after due substitutions 


(A + Ai + A2) = (A) -|- (Ai) + (A2) — (AAj) — (AA2) — (AiA2) + 

+ (AA1A2). 

Proceeding in the same way and using mathematical induction, the 
following general formula can be established: 


(A4-Ai + 


\ 


-f An-i) == ~ + ^(Ai'AyAjfc) 

t,; i,i,k 


where summations refer to all combinations of subscripts taken from 
numbers 0, 1, 2, . . . n — 1, one, two, three^ • • • , and n at a time. 

/ 6. Let A and B be two events whose probabilities are (A) ani (H). 
'^t is understood that the probability (A) is determined without any 
regard to B when nothing is known about the occurrence or nonoccur¬ 
rence of B. When it is known that B occurred, A may have a different 
probability, which we shall denote by the symbol (A, B) and call con¬ 
ditional probability of A, given that B has actually happened.^' 

Now we can state the second fundamental theorem, called the 
“theorem of compound probability^^ or “theorem of multiplication of 
probabilities,“ as follows: 

«Theorem of Compound Probability* Tfwj^qbability of simultaneous 
occurrence of A and B Hs given by the product of the unconditional pr^ability 
JfVBreverctrA iy the conditional probability of B, supposing that A actually 
occurred. In other words, 

"- " (AB) = (A) • (^, A). . ^ . 

Proof. Let N denote the total number of equally likely cases among 
which m cases are favorable to the event A. The cases favorable '^o A 
and B are to be found among the m case's favorable to A. Let their 
nuiftber be mi. Then, by the definition of probability, 


(AB) 


which also can be written thus: 


(AB)^ 


N 




mi 

m 


Now the ratio m/N represents the probability of A. To find the meaning 
of the second factor, we observe that, assuming the occurrence of A, 
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there are only m equally likely cases left (the remaining N -- m cases 
becoming impossible) out of which nii are favorable to B. Hence the 
ratio mi/m represents the conditional probability (jB, A) of B supposing 
that A has actually happened. 

Now since 


S = (^). f = (5, A), 


the probability of the compound event AJB is expressed by the product 
(AB) = (A) • (B, A). 

Since the compound event AB involves A and B symmetrically, 
we shall have also / 

(AB) = (B) • (A, B). 

The theorem of compound probability can easily be extended to several 
events. For example, let us consider three events, A, B, C. The occur¬ 
rence of A and B and C is evidently equivalent to the occurrence of thc^ 
compound event AB and C. We have, therefore, 

(ABC) = (AB) • (C, AB) • 

by the theorem of compound probability. By the same theorem 
(AB) = (A) • (B, A), 

so that 

(ABC) = (A) • (B, A) • (C, AB). 

Obviously this formula can be extended to compound events con¬ 
sisting of more than three components. 

In one particular but very important case, the expression for the 
compound probability can be simplified; namely, in the case of so-called 
“independent events.’’ Several events are “independent” by definilTon^ 
if the probability of any one of them is not affected by supplementary 
knowledge concerning the materialization of any number of the remaining 
events. For instance, if A and B represent white balls drawn fromj 
two different urns, the probability of A is the same whether the *color 
of the ball drawn from the other prn is known or not. Similarly, granted 
t h a t ., a coin is iw b i ased, heads.atuJJie first thro^and heads at the second 
throw are ind<^ndent ,evnnta,_ In such theoretical cases the 'inde¬ 
pendence of events can be reasonably assumed or agreed upon. In other 
cases, land especially in practical applications, it is not easy to decide 
whether events should be considered as independent or not. 

If A and B are independent, the conditional probability (B, A) is 
the same as the probability (B) found without any reference to A; this 
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follows from the definition of independence. Hence, the expression of 
compound probability {AB) for two independent events becomes 

UB) = U) • {B) 

so that the probability of a compound event with independent com¬ 
ponents is simply equal to the product of the probabilities of component 
events. This rule extends to any number of component events if they 
are independent. Let us consider three independent events, A, 5, and C. 
The independence of these events implies 

{B,A) = {B); {C,AB) = (C) 

and hence 

(ABC) = (^) . (^) . (C) 
in accordance with the rule. 

To illustrate the theorem of compound probability, let us consider 
two simple examples^ An urn contains 2 white balls and 3 black ones. 
Two balls are drawn, and it is required to find the probability that they 
are both white. Let A be the event consisting in the white color of the 
first ball, and B the event consisting in the white color of the second ball. 
The probability (A) of extracting a white ball in the first place is 

(A) = I 

To find the conditional probability (By A) we observe, after drawing one 
white ball, that 1 white and 3 black balls remain in the urn. The 
probability of drawing a w' *,e ball under such circumstances is 

(By A) = I 

N(iw, by the theorem of compound probability, we shall have 
(^) = I . i = -4. 

Evidently, in this example we dealt with dependent events. 

As an example of independent events, let a coin be tossed any giv^n 
number-of ti mes; say, n times. What is the probability of having only 
heads? The compound event in this example consists of n independent 
components; namely, heads at every trial. Now the probability of 
headsL m }'i, so the required probability will be 1/2”. 

Note: Two events A and B are independent by definition, if 
(Ay B) = {A) and (fl, A) = {B). 

However, one of these conditions follows from the other. Suppose the condition 

(A, B) = (A) 

is fulfilled, so that A is independent of B. We have then 

(AB) = (B) . (A). 
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On the other hand, 

{AB) « (A) . (B, A), 

whence 

(B, A) « (B), 

so that B is independent of A, 

Three events A, B, C are independent if the following four conditions are fulfilled: 
(A, B) - (A); (A, C) - (A); (B, C) = (B); (C, AB) * (C). 

From the first three conditions it follows that 


(B, A) * (B); (C, A) * (C); (C, B) = (C). 

To show that the other requirements 

(B, AC) » (B); (A, BC) - (A) 

are also fulfilled, we notice that 

(ABC) * (A) . (B, A) . (C, AB) - (A) • (B) • (C) 

because (C, AB) » (C) by hypothesis and (B, A) = (B) as proved. On the other 
hand, 

(ABC) = (A) • (C, A) • (B, AC) 

and (C, A) ■■ (C). Hence, comparing with the preceding expression, 

(B, AC) - (B). 

Similarly, it can be shown that 

(A, BC) - (A). 

The independence of four events A, B, C, D is assured if the following 11 conditions 
are fulfilled: 

(A, B) - (A, C) - (A, D) - (A); (B, C) « (B, D) * (B); (C, D) - (C); 

(C, AB) - (C); (A AB) « (A AC) * (A BC) » (A; (A ABC) - {D). 

• 

And in general, independence of n events is assured if 2" — w — 1 conditions of 
similar type are fulfilled. 

If several events are independent, every two of them are independent; but this 
does not suffice for the independence of all events, as can be shown by a sitnple exam¬ 
ple. An urn contains four tickets with numbers 112, 121, 211, 222, and one ticket is 
drawn. What are the probabilities that the first, second, or third digits in its number 
are 17 Let a unit such as the first, second, or third digit, be represented, respectively 
by A, B, or C. Then 

(A) - (B) « (C) - I « 


Compound probabilities (AB), (AC), (BC) are 

(AB) - (AC) - (BC) - i. 

since among four tickets there is only one whose number has first and second, or 
first and third, or second and third digits of 1. Now, for instance. 


(AB) . J - I. J - (A) . (B), 
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whence A and B are independent. Similarly, A and C; C and B are independent. 
Thus, any two of the events A, B, C are independent, but not all three events are. 
For, if they were, we should have 

(ABC)^l 

But (ABC) = 0 since in no ticket are all three digits equal to 1. 

7 . The theorems of total and compound probability form the founda-'' 
• tion of the theory of probability as it represents a separate branch of 
mathematical science. They serve the purpose of finding probabilities 
in more complicated cases, either by being directly applied or by enabling 
us to form equations from which the required probabilities can be found. 
A few selected problems will illustrate the various ways of using these ^ 
theorems. 

Problem 14. An urn contains a white balls and b black balls; another 
contains c white and d black balls. One ball is transferred from the first 
urn into the second, and then a ball is drawn from the latter. What is 
the probability that it will be a white ball? , 

Solution. The event consisting in the white color of the ball drawn 
from the second urn, can materialize under two mutually exclusive forms: 
when the transferred ball is a white one, and when it is black. By the 
theorem of total probability, we must find the probabilities corresponding 
to these two forms. To find the probability of the first form, we observe 
that it represents a compound event consisting in the white color of the 
transferred ball, combined with the white color of the extracted ball. 
The probability that the transferred ball is white is given by the fraction 

a 

a b 

and the probability that the ball removed from the second urn is white, is 

1 

c + d-l- 1 

because before the drawing there were c 1 white balls and d black 
balls in the second urn. Hence, by the theorem of compound probability, 
the probability of the first form ig 

__ _ 

(a -f 6)(c + d-f 1); 

In the same way, we find that the probability of the second form is 

be ^ 

{a b)(c d ^ 1 ) 

and the sum of these two numbers 

ac 4- 6c -h a • 

(a + 6)(c + d + 1) 
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gives the probability of extracting a white ball from the second urn, after 
one ball of unknown color has been transferred from the first urn. 

8. Problem 16 . Two players agree to play under the following 
conditions: Taking turns, they draw the balls out of an urn containing 
a white balls and h black balls, one ball at a time. He who extracts the 
first white one wins the game. What is the probability that the player 
who starts will win the game? 

Solution. Lei; A be the player who draws the first ball, and let B 
be the other player. The game can be won by Aj first, if he extracts a 
white ball at the start; second, if A and B alternately extract 2 black 
balls and then A draws a white one; third, if A and B alternately extract 
4 black balls and the fifth ball drawn by A is white; and so on. By the 
theorem of total probability, the probability for A to win the game, 
is the sum of the probabilities of the mutually exclusive ways (described 
above) in which he can win the game. The probability of extracting a 
white ball at first is 


_a _ 

a H- 6 

The probability of extracting 2 black balls and then 1 white ball is found 
by direct application of the theorem of compound probabilities. Its 
expression is 

^ _6(6 - l)a _ 

(a -h 6)(a -h 6 — l)(o -+-6 — 2) 

The probability of extracting 4 black balls and then 1 white ball is given 
by 

6(6 - 1)(6 - 2)(6 - 3)a 

(a + 6)(a -f 6 - l)(a + 6 - 2 )(a + 6 - 3 )(a -f 6 - 4 )' 

using the same theorem of compound probability. 

In the same way we deal with all the possible and mutually exclusive 
ways which would allow A to win the game. Then, by adding the above 
given expressions of partial probabilities, we obtain the expression for the 
required probability in the form of the sum 

p, _i_ri 4. -1) _, 

o "+• 6|_ (a -j- 6 — l)(a -j- 6 — 2) 

, _ 6(6 - 1)(6 - 2)(6 - 3 )_. 

(a -h 6 - l)(a + 6 - 2 )(o -f 6 - 3)(a + 6 - 4 ) ^ 

The law of formation of different terms in this sum is obvious; and 
the sum automaticaUy ends as soon as we arrive at a term which is equal 
to zero. 
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In the same way, we can find that the probability for the player B 
to win is expressed by an analogous sum: 

r. ^ r b . _ ^ l)(b - 2)_ __ 

d 4” h\_a 4" — 1 {d b — l)(a 4" — 2)(a 4“ b — 3) 

But one of the players, A or B, must win the game, and the winning of 
the game by A and B are opposite events. Hence, 

P4-Q = 1 

or, after substituting the above expressions for P and Q and after obvious 
simplifications, 

1 , h , _Kb - 1)__ 4 . . . . = « + b 

d b — 1 (jd ^ b — 1)(£i4"^ — 2) d 

This is a noteworthy identity, obtained, as we see, by the principles 
of the theory of probability. Of course, it can be proved in a direct 
way, and it would be a good problem for students to attempt a direct 
proof. There are many cases in which, by means of considerations 
belonging to the theory of probability, several identities or inequalities 
can be established whose direct proof sometimes involves considerable 
difficulty. 

9. Problem 16. Each of k urns contains n identical balls numbered 
from 1 to n. One ball is drawn from every urn. What is the probability 
that m is the greatest number drawn? 

Solution. Let us denote by Pm the required probability. It is not 
parent how we can find the explicit expression for this probability, but 
using the theorems of total and compound probability, we can form 
ec^uations which yield the desired expression for Pm without any difficulty. 
To this end, let us first find the probability P that the greatest number 
drawn docs not exceed m. It is obvious that this may happen in m 
mutually exclusive ways; namely, when the greatest number drawn is 
1, 2, 3, and so on up to m. The probabilities of these different hypotheses 
being Pi, P 2 , . . . Pm, their sum gives the following first expression for 
P: 

(1) P = Pi 4- P 2 4- • • • 4- P^. 

We can find the second expression for P using the theorem of com¬ 
pound probability; namely, the greatest number drawn does not exceed 
m if balls drawn from all urns have numbers from 1 to m. The proba¬ 
bility of drawing a ball with the number 1, 2, 3, ... m from any urn is 
m/n. And the probability that this will happen for every urn is a 
compound event consisting of k indepe?'dent evehts with the same 
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probability m/n. Therefore, by the theorem of compound probability 



And this compared with (1) gives the equation 
(2) P, + P, + • • + = ^- 

Substituting m — 1 for m in this equation, we get 

Pi + P» + • • • + Pm-l = ~ 

n* 

and it suffices to subtract this from (2) to have the required expression for 
Pm: 

p _ m* - (m - I)** 

10. Problem 17. Two persons, A and P, have respectively n + 1 
and n coins, which they toss simultaneously. What is the probability 
that A will have more heads than P? 

Solution. Let /i, and p, p' be numbers of heads and tails thrown 
by A and P, respectively, so that ^+>' = ^ + 1, M' + v' = n. The 
required probability P is the probability of the inequality m > The 
probability 1 — P of the opposite event m ^ is at the same time 
the probability of the inequality v > v'] that is, 1 — P is the probability 
that A will throw more tails than P. By reason of symmetry 1 — P = P, 

P->^. 

11. Problem 18. Three players A, P, and C agree to play a series,of 
games observing the following rules: two players participate in each game, 
while the third is idle, and the game is to be won by one of them. The 
loser in each game quits, and his place in the next game is taken by the 
player who was idle. The player who succeeds in winning over both 
of his opponents without interruption, wins the whole series of games. 
Supposing that the probability for each player to win a single game is 

and that the first game is played by A and P, find the probability for 
A, P, and C, respectively, to win the whole series, if (a) the number of 
games to be played is limited and may not exceed a given number n; 
if (6) the number of games is unlimited. 

Solution. Let Pn, Qn, Rn be the probabilities for A, P, and C, respec¬ 
tively, to win a series of games when their number cannot exceed n. By 
reason of symmetry, Pn ~ Qn so that it remains to find Pn and Pn. 
The player A can win the whole series of games in two mutually exclusive 



Sac. 11] THEOREMS OF TOTAL AND COMPOUND PROBABILITY 39 


ways: if he wins the first game, or if he loses the first game. Let the 
probability of the first case be pn and that of the second r„. Then 

P» = p» + r-n. 

A can win the whole series af ter winning the first game, in two mutually 
exclusive ways: (a) if he wins over B and C in succession; (6) if he wins 
the first game from B and loses the second game to C) then, if in the third 
game C loses to and in the fourth game A wins over B and later wins 
the whole scries of not more than n — 3 games. Now, the probability 
of case (a) is M ‘ M M the theorem of compound probability; 
that of case (6) by the same theorem is Mpn-s; and the total probability is 

(1) Pn = i + iPn_3. 

If A loses the first game to By but wins the whole series, then In the 

second game C wins over B while the third game is won by A, and not 

more than n — 2 games are left to play. Hence, 

(2) Tn = iPn~2. 

Since evidently pt ^ Pi = Pi == equation (1) by successive 
substitutions yields 

P3* = J(l + I + ^, + • • • + 

= i(l + I + + ■ • • + 8^.) 

^ P3*+2 = + ■ ■ ■ + 

or, in condensed form for an arbitrary n 

* 

= ?(1 - 8 L H J)^ 

denoting by [x] the greatest integer (contained in x. Hence, by virtue of 
(2) the general expression of rn will h)e 




r, = ,<,(1 - 8 L 3 j) 

and that of P«, Q«, 

P» = Q„ = - A8 L 3 J _ 


A8 


.>48 




Finally, to find the probability for C to win, we observe that this can 
happen only if C wins the second game; hence. 
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Since Pn + Qn Rn < 1, the difference 

1 - P« - Q. - R. = + AS'ts] + 

represents the probability of a tie in n games. This probability decreases 
rapidly when n increases, so that in a long series of games a tie is prac¬ 
tically impossible. If the number of games is not limited, the proba¬ 
bilities Py Qj R for Aj By C, respectively, to win are obtained as limits of 
Pm, Qn, Rnt when n increases indefinitely. Thus 

P jR = -iV 


^ Problems for Solution 

^1. Three urns contain respectively 1 white and 2 black balls; 3 white and 1 black 
ball; 2 white and 3 black balls. One ball is taken from each urn. What is the proba¬ 
bility that among the balls drawn there are 2 white and 1 black? Ana. 

Cards are drawn one by one from a full deck. What is the probability that 
' 10 cards will precede the first ace? Ana. =“ 0.03938. 

Urn 1 contains 10 white and 3 black balls; urn 2 contains 3 white and 5 black 
balls. Two balls are transferred from No. 1 and placed in No. 2 and then one ball is 
taken from the latter. What is the probability that it is a white ball? Ana. ®^ 30 - 
4. Two urns identical in appearance contain respectively 3 white and 2 black balls; 
2 white and 5 black balls. One urn is selected and a ball taken from it. What is the 
probability that this ball is white? Ana. 

^ 6 . What is the probability that 5 tickets drawn in the French lottery all have one- 

digit numbers? . Ana. K 441626 = 29.10"’. 

V 6. What is the probability that each of the four players in a bridge game will get a 

complete suit of cards? Ana. 24^—^= 4.474.10“***. 

1 •2 • • '52 

7. What is the probability that at least one of the players in a bridge game will 
get a complete suit of cards? 

16 • 13! • 39! - 72 • (13!)« • 26! -f 72 • (13!)« 


Ana. 


62! 


2.62 • 10-“. 


See Sec. 5, page 31. 

8. From an urn with a white and b black balls n balls are taken. Find the prob¬ 
ability of drawing at least one white ball. Ana. The required probability can be 
i in two ways. First expression: 


1 


6(6 - 1) ' ' • (6 - n + 1) 


Second expression: 


(o + 6)(o +6-1) 


(o + 6 — n + 1) 


—f 

a + 6L 


1 +■ 


6(6 - 1 ) 


a+6-1 (a+ 6- l)(a + 6 

Equating them, we have an identity 

_- 1} ' ' ' (6 - 71+2) 


: • • (6-71+2) _1 

- 2) • . ' (a + 6 - 71 + DJ* 


1 + 


a + 6 - 1 


• + 


(o + 6 - l)(a + 6-2) 


(o + 6 - n + 1) 


= ?_±^[ 1 _ 6(6 - 1) ' ' ' (6 - n + 1) _1 

• o L (« + 6)(o + 6-1) • • • (a+6-n + l)J 
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9. Three players ii, 5, C in turn draw balls from an urn with 10 white and 10 black 
balls, taking one ball at a time. He who (..tracts the first white ball wins the game. 
Supposing that they start in the order A, C, find the probabilities for each of them 
to win the game. Aim. For A, 0.56584; for B, 0.29144; for C, 0.14271. 

If n dice arc thrown at a time, what is the probability of having each of the 
points 1, 2, ... 6, appear at least once? Find the numerical value of tliis prob¬ 
ability for n » 10. Ana. 

« 1 - 6(g)" -h 15(g)" - 20(g)" -f 15(1)" - 6 . (J)" 
pxo *= 0.2718. 

Hint: Use the formula in Sec. 5, page 31. 

11 . In a lottery m tickets are drawn at a time out of the total number of n tick(>tH, 
and returned before the next drawing is made. What is the probability that in k 
drawings each of the numbers 1, 2, ... n will appear at least once? Ana. 


P* 


n/n — mV n(n — 1)/ n — n — m — iV 
iVn/"^ 12\n/\n-l ) 


12. We have k varieties of objects, each variety consisting of the same number of 
objects. These objects are drawn one at a time and replaced before the next drawing. 
Find the probability that n and no less drawings will be required to produce objects of 
all varieties. Ana. 


= (fc _ !).-« _ i^(fc _ 2)--t + - 3)«->-. 

18. Three urns contain respectively 1 white, 2 black balls; 2 white, 1 black balls; 
2^hite, 2 black balls. One ball is transferred from the first urn into the second; then 
/3ne from the latter is transferred into the third; finally, one ball is drawn from the 
thi^um. What is the probability of its being white? Ana. 

|14. '£ach of n urns contains a white and b black balls. One ball is transferred 
from the first urn into the second, then one ball from the latter into the third, and so 
off Finally, one ball is taken from the last urn. What is the probability of its being 
white? Ana. Denote by p* the probability of drawing a white ball from the kih um. 
Then 

Pk+i = - -- - pfc 4- ___(i - p*) 


o + 6 + r 


o + 6 + r 


for k = 1, 2, 


n — 1. Hence, 


a 

P» ~ ^ * 

o -f 0 

16. Two players A and B toss two dice, A starting the game. The game is won 
by A if he casts 6 points before B casts 7 points; and it is won by B if he casts 7 points 
before A casts 6 points. What are the probabilities for A and B to win the game if 
they agree to cast dice not more than n times? What is the probability of a tie? 
Ana. Probability for A: 

P« « m - (H«)-l if n - 2m 

Pn = g?ll - (lfg)"+»l if n = 2m + 1. 


if 

if 


Probability for B: 


7n =gl[l ~(Ug)-l 

qn - gill - (l?g^-1 


n = 2m • 
u = 2m + 1. 
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Probability of a tie: 

* (Iff)’" if n = 2 m; r* = HCIfl)’" if n = 2 m + 1 . 

If n increases indefinitely, Vn converges to 0 and pn, Qn converge to the limits 
V -\h q- li, 

which may be considered as the probabilities for A and B to win if the number of 
throws is unlimited. 

16. The game known as “craps” is played with two dice, and the caster wins 
unconditionally if he produces 7 or 11 points (which arc called “naturals”); ho loses 
the game in case of 2, 3, or 12 points (called “craps”). But if he produces 4, 5, 6 , 8 , 9, 
or 10 points, he has the right to cast the dice steadily until lie throws the same num¬ 
ber of points he had before or until he throws a 7. If he rolls 7 before obtaining his 
point, he loses the game; otherwise, he wins. What is the probability to win? 

Ana. 24^^05 = 0.493. 

17. Prove directly the identity in Prob. 15, page 37. 

Solution 1. Let 


v>(c, h) 


^ . ^ (fe - 1) . 6(6 - l)(b - 2) 

c c(c - 1 ) c(c - i)(c ~ 2 ) 


where 6 is a positive integer and c > h. Then 


whence 


viCf 6) = -[1 + ip(c — 1, 6 — 1)] 

c 


12 3 

1) = v>(c, 2) = ---; ^(c, 3) =-- 

C C — 1 c — / 


and in general 


6 ) = 


c - 6 + 1 

0 + 6 


Taking c «= o + 6 — 1 , we have 

1 + ip{cL + 6 — 1» 6) = 

Solution 2. The polynomial 


5(*) =1 + -r + 


6(6 - 1 ) 
c(c - 1 )' 


X* + 


can be presented in the form of a definite integral 


S(x) = (c + - t(l - i))‘(l - 


whence 


S(l) - (c + 1 ) 




->d( 


c + 1 0+6 


c -6 + 1 


ifc«a+ 6 -l. 

18. Find the apjfioximate expressions for the probabilities P and Q in Prob. 15, 
page 36, when 6 is a large number. Take for numerical application o » 6 50. 
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Solution. Since P Q » 1, it suffices to seek the approximate expression for 
P - Q. Now 


whence 


p - 0 - oJJ‘(i - am - t)-w« 

)re88ion of this int< 


2-C! 


To find the approximate expression of this integral, we set 

(1 

whence u can be expressed as a power series in v: 


2 4b + a - 1 12b» + (26 + a - 1)* . 

u =- V -H-w* — ... 

26+a-l (26+fl-l)* 3(26 + o-l)» 

Substituting the resulting expression of dujdv and integrating with respect to v 
between limits 0 and <«, we obtain for P — Q an asymptotic expansion whose first 
terms are 


P ^ [l 46 + g - 1 1 fl[12b« -K26 -f fl - 1)*] (-1)^ 

^ 26 + o - (26 + g - 1)>J (26 + a - 1)* 2«C:^* 

A more detailed discussion reveals that the error of this approximate formula is less 

.u . .. «I40(g - D* - 66(a - 1) + 326*] 

than g(^)*^H?^)® ‘ and greater than-——-— - provided 

(26 -f a — ir 

6 ^ 12. For g ■= 6 * 50 the formula yields 

P ~ Q =* 0.3318; P = 0.6659; Q =* 0.3341. 
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CHAPTER III 


REPEATED TRIALS 

1. In the theory of probability the word 'Hriar^ means an attempt to 
produce, in a manner precisely described, an event E which is not certain. 
The outcome of a trial is called a success^* if E occurs, and a failure’’ if 
E fails to occur. For instance, if E represents the drawing of two cards 
of the same denomination from a full pack of cards, the ‘‘trial” consists 
in taking any two cards from the full pack, and we have a success or 
failure in this trial according to whether both cards are of the same 
denomination or not. 

If trials can be repeated, they form a “series” of trials. Regarding 
series of trials, the following two problems naturally arise: 

а. What is the probability of a given number of successes in a given 
series of trials? And as a generalization of this problem: 

б. What is the probability that the number of successes will be 
contained between two given limits in a given series of trials? 

Problems of this kind are among the most important in the theory of 
probability. 

2. Trials are said to be “independent” in regard to an event E if 
the probability of this event in any trial remains the same, whether 
the results of any number of other trials are known or not. On the otk'"" 
hand, trials are “dependent” if the probability of E in a certain trial 
varies according to the information we have about the outcome of one or 
more of the other trials. 

As an example of independent trials, imagine that several times in 
succession we draw one ball from an urn containing white and black balls 
in given proportion, after each trial returning the ball that has been 
drawn, and thoroughly mixing the balls before proceeding to the next 
trial. With respect to the color of the balls taken, we may reasonably 
assume that these trials are independent. On the other hand, if the 
balls already extracted are not returned to the urn, the above described 
trials are no longer independent. To illustrate, suppose that the urn 
from which the balls are drawn, originally contained 2 white and 3 black 
balls, and that 4 balls are drawn. What is the probability that the 
third ball is white? If nothing is known about the color of the three 
other balls, the probability is If we know that the first ball is white, 
but the colors of the second and fourth balls are unknown, this proba¬ 
bility is yi. In general, the probability for any ball to be white (or black) 

44 
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depends essentially on the amount of information we possess about the 
color of the other balls. Since the urn contains a limited number of 
balls, series of trials of this kind cannot be continued indefinitely. 

As an example of an indefinite series of dependent trials, suppose that 
we have two urns, the first containing 1 white and 2 black balls, and the 
second, 1 black and 2 white balls, and the trials consist in taking one 
ball at a time from either urn, observing the following rules: (a) the 
first ball is taken from the first urn; (6) after a white ball, the next is 
taken from the first urn; after a black one, the next is taken from the 
second urn; (c) balls are returned to the same urns from which they were 
taken. 

Following these rules, we evidently have a definite series of trials, 
which can be extended indefinitely, and these trials are dependent. 
For if we know that a certain ball was white or black, the probability 
of the next ball being white is 3^ or %, respectively. 

Assuming the independence of trials, the probability of an event E 
may remain constant or may vary from one trial to another. If an 
unbiased coin is tossed several times, we have a series of independent 
trials each with the same probability, 3^, for heads. It is easy to give 
an example of a series of independent trials with variable probability for 
the same event. Imagine, for instance, that we have an unlimited 
number of urns with white and black balls, but that the proportion of 
white and black balls varies from urn to urn. One ball is drawn suc¬ 
cessively from each of these urns. Evidently, here we have a series of 
trials independent in regard to the white color of the ball drawn, but 
with the probability of drawing a ball of this color varying from trial to 
trial. 

, In thi s chapter w e shall ^jatJuss th^ simplegtjg aoc of oo rie s of inde- 
pencfent trials withc ^stant probability—are often called ‘^Ber- 
'nouHittn seriesTjflHals’Mn ho^^ of Jacob Bernoulli who, in his classical 
book, “Ars conjectandi’^ (1713) made a profound study of such series 
and was led to the discovery of one of the most important theorems in 
the theory of probability. 

3. Considering a series of n independent trials in which the probability 
of an event £7 is p in every trial (that of the opposite event F being 
q — I — the first problem which presents itself is to find the proba¬ 
bility that E will occur exactly m times, where m is one of the numbers 
0, 1, 2, . . . n. In what follows, we shall denote this probability by Tm- 
In the extreme cases m = n and m = 0 it is easy to find Tn and To. 
When w = n, the event E must occur n times in succession, so that T* 
represents the probability of the compound event EEE . , , E with n 
identical components. These components are independent events, since 
the trials are independent, and the probability of each of them is p. 
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Hence, the compound probability is 


or 


Tn = V ' V ’ V * * * P times) 
Tn = p". 


The symbol To denotes the probability that E will never occur in n 
trials, which is the same as to say that F will occur n times in succession. 
Hence, for the same reasons as before. 

To = = (1 - p)". 

When m is neither 0 nor n, the event consisting in m occurrences of E 
can materialize in several mutually exclusive forms, each of which may 
be represented by a definite succession of m letters E and n — m letters F. 
For example, if n = 4 and m = 2, we can distinguish the following mutu¬ 
ally exclusive forms corresponding to two occurrences of E: 

EEFF, EFEF, EFFE, FEEF, FEFE, FFEE. 


To find the number of all the different successions consisting of m 
letters E and n -- m letters F, we observe that any such succession is 
determined as soon as we know the places occupied by the letter E, 
Now the number of ways to select m places out of the total number of 
n places is evidently the number of combinations out of n objects taken 
m Bit & time. Hence, the number of mutually exclusive ways to have 
m successes in n trials is 

^ . 

- 1) •• in - m-\rl) 

1*2-3 • • • m 

The probability of each succession of m letters E and n ^ m letters F, 
by reason of independence of trials, is represented by the product of 
m factors p and n — m factors g, and since the product does not depend 
upon the order of factors, this probability will be 


pmqn-m 

for each succession. Hence, the total probability of m successes in n 
trials is given by this simple formula: 


(1) 


= 


n(n — 1) 


(n — m + 1) 


1-2 ^ 


m 


ptnqn 


which can also be presented thus: 


( 2 ) 


T« 


n! 


m!(n — m)\ 


pmqn 
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This second form can be Uvscd e 'en for m = 0 or m = n if, as usual, 
we assume 0! = 1. Either of the expressions (1) or (2) shows that Tm 
may be considered as the coefficient of in the expansion of 

(9 + pO" 

according to ascending powers of an arbitrary variable t. In other 
words, we have identically 

{q + Vty = To + T,t + 4- • • • + Tnt\ 

For this reason the function 

(g + pty 

is called the ‘‘generating function” of probabilities To, Ti, T2, . . . T„. 
By setting ^ = 1 we naturally obtain 

To 4- Ti + T 2 -f • • • -f Tn = 1. 

4. The probability P(ky 1) that the number of successes m will satisfy 
the inequalities (or, simply, the probability of these inequalities) 

k ^ m ^ I 

where k and I arc two given integers, can easily be found by distinguishing 
the following mutually exclusive events: 

m ~ k or m — k -i- ly .. . or m = 1. 

^ ''cordingly, by the theorem of total probability, 

p(/c, z) = n + 4- • • • + T, 

f 

or. using expression (2), 

i 

m^k 

In particular, the probability that the number of successes will not 
be greater than I is represented by the sum 

P{0, 1) = q' + + . . . 4 . 

4. — 1) • • • (n — Z 4- 

Similarly, the probability that the number of succe:jses in n trials will 
not be less than Z can be presented thus: 
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P(.l, n) 


n(n — 1) 


1-2 




“-'[l + 


i+ 1 ? 




(n - 0(« - I - D /pV 
^ (J + 1)0 + 2) \q) + 


where the series in the brackets ends by itself. 

6. The application of the above established formulas to numerical 
examples does not present any difl&culty so long as the numbers with 
which we have to deal are not large. 

Example 1. In tossing 10 coins, what is the probability of having exactly 5 heads? 
Tossing 10 different coins at once is the same thing as tossing one coin 10 times, if all 
the coins are unbiased, which is assumed. Hence, the required probability is given 
by formula (1), where we must take n = 10, m = 5, p = g = H and it is 


10 • 9 • 8 • 7 • 6 1 

1 • 2 • 3 • 4 • 5 ' 2*« 


262 

1024 


= 0.24609. 


Example 2. If a person playing a certain game can win $1 with the probability 
3^, and lose twenty-five cents with the probability what is the probability of win¬ 
ning at least S3 in 20 games? Let m be the number of times the game is won. The 
total gain (considering a loss as a negative gain) will be 

m — J(20 — m) = Im — 5 dollars 

and the condition of the problem requires that it should not be less than S3. Hence 

|m - 6 ^ 3, 

whence m ^ 63^ or, since m is an integer, m ^ 7. That is, in 20 trials an event with 
the probability 3^ must happen at least 7 times and the probability for that is: 


20 ! 


im!(20 


1 

m)!\3 


This sum contains 14 terms; but it can be expressed through another sum containing 
only 7 terms, because 



m—7 m—O 


Using the last expression, one easily gets 0.5207 for the required probability. 

6. In the series of probabilities 

To, Ti, T,, . . . Tn 

for 0, 1, 2, ... n successes in n trials, the terms generally increase till 
the greatest term T^ is reached, and then they steadily decrease. For 
instance, if n = 10, p = ^ = 3^ the values of the expression 


for m = 0, 1, 2, ... 10 are 
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1, 10, 45, 120, 210, 252, 210, 120, 45, 10, 1 
so that Ti is the greatest term. For obvious reasons the number /x (to 
which the greatest term in the series of probabilities To, Ti, . . . T« 
corresponds) is called the ^^most probable^' number of successes. 

To prove this observation in general, and to find the rule for obtaining 
/n, we observe first that the quotient 

Tyn-^ i ^ n — m p 
Tm m 1 q 


decreases with increasing m, so that 


(a) 



>^> 


> 


The two extreme terms in (a) are 


T_1 = P_ 

To q ' Tn-i nq 


and if n is large enough, the first of them is > 1 and the last <1. To 
find exactly how large n must be, we notice that 


if 

whence 

^milarly, 

if 

whence 



np > g = 1 — p 


n + 1 > -• 
V 



p < nq or 1 — g < ng 

n 4- 1 > -• 

Q 


Consequently, if n + 1 is greater than both 1/p and 1/g, the first term 
in (a) is > 1 and the last term is < 1. As the terms of (a) form a decreas¬ 
ing sequence, there must be a last term which is ^ 1. Let it be 





Then 
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and 



> 




> 


or, which is the same, 



To < Ti < Ti < • — < ^ 

> 7\+i > T^+2> ' — > Tn. 

In other words, the seciuence of probabilities increases till the greatest 
term is reached and steadily decreases from then on. Besides Tp, 
there may be another greatest term T^-i; namely, when = T,,; 

hut all the other t(a*ms arc certainly less than The number is 
perfectly determined by the conditions 


^ n - fx 1 2 ^ ^ n - ij, 2 . . 

fjL q ~ /i + l(7 


which are equivalent to the two inequalities 


(n + l)p ^ m(p + g), np - q < ti{p + q). 
These in turn can be presented thus: 

P ^ (n + l)p < M + 1 


and snow that p is uniquely determined as the greatest integer contained in 
{n -h l)p. If (n + l)p is an integer, then /* = (n + l)p and T„ = T^_i. 
That is, there are two greatest terms if, and only if, (n -f l)j? is an 
integer. 

Let us consider now what happens if 

n + or 

V q 

In the first case, all the terms in (a) are less than 1 with the single excep¬ 
tion of the first term Ti/To which may be equal to 1; namely, when 

n + 1 = -• Consequently, 

P 

To^ Ti> T2> • • > Tn 


so that To is the greatest term. If (n -f l)p < 1 the greatest integer 
contained in (n + l)p is 0, and there is only one greatest term To. If, 
however, (n + l)p = 1, there are two terms To = Ti greater than 
others. 

If (n + l)q S 1, all the terms in series (a) are > 1 with the exception 
of the last term, which may be equal to 1; namely, when (n + 1)^ = 1. 
Hence, 


To < Ti < • • • < Tn-i ^ T, 
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so that Tn is the greatest term, and +he preceding term Tn-i can be equal 
to it only if (n + 1)^' = 1. Now the condition 

(w 4- 1)^ ^ 1 

is equivalent to 

(n -h l)p ^ n. 

On the other hand, because p < 1, 

(n + l)p < n 4- 1. 

Therefore n is the greatest integer contained in (n 4- l)p. 

Comparing the results obtained in the last two cases (excluded at 
first) with the general rule, we see that in all cases the greatest term 
corresponds to 

M = [(n 4- l)p]. 


If (n 4- l)p is an integer, then there arc two greatest terms and T^-i. 
This rule for determining the most probable number of successes is very 
simple and easy of application to numerical examples. 


Example 1. Letn = 20, p = Then (n -f- l)p = 8.4, and the greatest 

integer contained in this number is ^ = 8. Hence, there is only one most probable 
number of successes /x »= 8 with the corresponding probability 


Ts = 


20 ! 

8!12!\5/ \5/ 


0.1797. 


Example 2. Let n ~ 110, p = Q - Hf and (n + l)p = 37, an integer 
Consequently, 36 and 37 are the most probable numbers of successes with the corre- 
$onding probability 


T 38 


T^7 = 


110! /iYY^Y’ 

37!73!\3/ \3/ 


0.0801. 


7. When n, m, and n — in are large numbers, the evaluation of 
probability Tm by the exact formula 

ft f 

T =r_ 

m!(n - m)f ^ 

becomes impracticable and it is necessary to resort to approximations. 
For approximate evaluation of large factorials we possess precious means 
in the famous **Stirling formula.’’ Referring the reader to Appendix I 
where this formula is established, we shall use it here in the following 
form: 

log X I = logv^2irx 4* X log X — X <A)(x) 


12(x 4- i) 


< (o(x) < 


J_ 

12a;' 


where 
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In the same appendix the following double inequality is proved; 

l2n ~ ^ 12n + 6 12m + 6 “ 

1 

\2l + 6 

Now from Stirling’s formula 

n\ — \/2xn 

and two similar expressions for m! and (n — m)! follow. Substituting 
them into Tm, we get two limits 

<« > s/&(F-w)(f 

where 

1 1 1 
]C = gl2n + 6“l2m4 6~l2(n-m)+6 

J_I_ 1 

I = 12m 12(n —m) 

When Tif lUj n m are even moderately large k and I differ little from 
each other. 

Inequalities (3) and (4) then give very close upper and lower limits 
for Tm. To evaluate powers 

(Y "* 

\m / \^ — 7n/ 

0 

with large exponents, sufficiently extensive logarithmic tables must be 
available. If such tables are lacking, then in cases which ordinarily 
occur when ratios np/m and nq/(n — m) are close to 1, we can use 
special short tables to evaluate logarithms of these ratios or else resort to 
series. 

8 . Another problem requiring the probability that the number of 
successes will be contained between two given limits is much more 
complex in case the number of trials as well as the difference between 
given limits is a large number. Ordinarily for approximate evaluation 
of probability under such circumstances simple and convenient formulas 
are used. These formulas are derived in Chap. VII. Less known is 
the ingenious use by Markoff of continued fractions for that purpose. 

It suffices to devise a method for approximate evaluation of the 
probability that Uhe number of successes will be greater than a given 
integer I which can be supposed > np. We shall denote this probability by 
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P{1). A similar notation Q(f) will be used to denote the probability 
that the number of failures is > I where again I > nq. The probability 
P{k, 1) of the inequalities k ^ m ^ I can be expressed as follows; 

P(fc, 0 = 1- P{1) - Q{n - k) 


+ 


if I > np and k < np\ 

P{k, 1) = P{k - 1) - P{1) 
if both k and I are > np; and finally 

P{k, 1) =Q(n- I- 1) - Q{n - A;) 

if both k and I are < np. 

For P{1) we have the expression 

p(D = _ — _1 4- —~ ^ 2 4- 

(Z + l)!(n - J - 1)!^ ^ L ^ 1 + 2 

^ (n - Z - l)(n - I - 2)/ p\* 

(Z + 2)(Z + 3) \q/ 

The first factor 

_ ^ _ 

(Z 4 l)!(n - Z - l)r ^ 

can be approximately evaluated by the method of the preceding section. 
The whole difficulty resides in the evaluation of the sum 

- - 1)(” - * - 2)/'pV 1 ... 

8-1 + -^..— - + -^2y(r+'3)— \q) + 

whi^h is a particular case of the hypergeomctric series 

F(a a y x) = 1 + -^X + + • • • 

fi.a.a.y.x) 1 + ^ ^x+ J _ 1) ® + 

In fact 


n 4 Z -f 1, 1, Z -f 2, — ^ = S. 

Now, owing to this connection between S and hypergeometric series, S 
can be represented in the form of a continued fraction. First, it is 
easy to establish the following relations: 

F{a, /3 4 1, 7 + 1, a;) = F(a, /3, 7 , x) 4 

+ * + 1, 0 + 1,7 + 2 , x) 

^(« + 1 , 7 + 1, x) = F(a, a, y,x) + 

+ ® ^ “y + 2, *)• 
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Substituting a -f n, /3 + n, 7 + 2/1 and a + w, j3 -j- n 1, 7 -f 2n + 1, 
respectively, for a, jS, 7 in these relations and setting 

Xir, = F{a -I- n, 4- n, 7 + 2n, x); 

Xin+i = Z'X" + n, |9 “h n + 1 , 7 + 2 n -j- 1 , a;) 

^ _ (^ 4- n )(7 - a -I- n) . ^ _ (« + ^)(7 - ^ + n) 

(7+ 2n)(7+ 2n - 1)’ (7 + 2nK7 + 2n + 1) 

for brevity, we have 

.^0 ~ -X" 1 ““ fliX.X'2 
Xi ^ X 2 — (I 2 XX 3 


whence 


1 — Xtn dmXXn+l 


= 

A'o 


1 


1 


aix 

T" 


ajx 

1 


flm-ia? 

1 


In our particular case 





X m+l 


X, = F(-n 4- / 4- 1, + 2, x), Xo = 1 


and (i 2 n— 2 i—i ~ 0. 

Hence, taking ^ introducing new notations, we hav'^ a 

finite continued fraction 


(5) 


S = 


1 Cl , 

1 — T j- - 
1 4“ T _ 

1 + 


where 


Cn—/—1 
1 + ^ 


(6) c* 


(n — k — l)(l 4- k)p 
(l + 2k - m 4- 2k)q'^ 


du - 


k(n + k)p 

{I + 2k)(l + 2k + l)q 


Every one of the numbers Ck will be positive and < 1 if this is true for 
Cl. Now 


Cl 


(n — I ~ l)p 
{I 4- 2)q 


< 1 


if I > npf and that is exactly what we suppose. The above continued 
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fraction can be used to obtain approximate values of S in excess or in 
defect, as we please. Let us denote the continued fraction 


Ck 

1 



1 + • . 


by «jfc. Then 


0 < w* < Ckt 


which can be easily verified. Furthermore, 


S = 


1 - 
and in general 


“> = T+ 


a>2 


Ck , 

«* = Y dk 


wi — Y aj 


1 - 


«8 


WiL+l 


Having selected fc, depending on the degree of approximation we 
desire in the final result (but never too large; A; = 5 or less generally 
suffices), we use the inequality 


0 < < Ck+i 

to^btain two limits in defect and in excess for w*. Using these limits, we 
obtain similar limits for . , . and, finally, for wi and S. 

The series of operations will be better illustrated by an example. 

9 ? Let us find approximately the probability that in 9,000 trials an 
event with the probability j) = will occur not more than 3,090 times 
and not less than 2,910 times. To this end we must first seek the 
probability of more than 3,090 occurrences, which involves, in the first 
place, the evaluation of 




9000! /iY°«Y2Y*®* 

309115909! W \3/ ‘ 


By using inequalities (3) and (4) of Sec. 7, we find 


0.011286 < Tsoai < 0.011287. 


Next we turn to the continued fraction to evaluate the sum 8, The 
following table gives approximate values of Cl, C2, . . . c«anddi,d2 . . . ds 
to 5 decimals and less than the exact numbers 
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n 


dn 

1 

0.95553 

0.00047 

2 

0.95444 

0.00094 

3 

0.95335 

0.00140 

4 

0.95227 

0.00187 

5 

0.95119 

0.00234 

6 

0.95010 



We start with the inequalities 

0 < 0)6 < 0.96011 


and then proceed as follows: 


1.00234 

1.02041 

1.01716 

1.01416 

1.00785 


< 1 + T -— < 1.047H; 

1 — 0)6 

< 1 < 1.03685; 

1 — 0)6 

< 1 + - < 1.02113; 

1—0)4 

< 1 + < 1.01514; 

1—0)8 

< 1 + < 1.00816; 

1 — 0)2 

0.05221 < ^ < 
0.02161 •< ST3091 


0.90839 < 0)6 < 0.94898 

0.91842 < 0)4 < 0.93324 

0.93362 < 0)8 < 0.93728 

0.94020 < 0)2 < 0.94113 

0.94779 < o)x < 0.94810 
1 

0.05190 
< 0.02175. 


Hence, we know for certain that 

0.02161 < P(3,090) < 0.02176. 

By a similar calculation it was found that 

0.02129 < Q(6,090) < 0.02142, 

so that 

0.04290 < P(3,090) + Q(6,090) < 0.04317. 

The required probability P that the number of successes will be contained 
between 2,910 and 3,090 (limits included) lies between 0.95683 and 
0.95710 so that, taking P == 0.9570, the error in absolute value will be 
less than 1.7 X 10“^. 


Problems for Solution 


What is the probability of having 12 three times in 100 tosses of 2 dice? 
^ An», Cfo.(A)*(H)” - 0.21 


>.2267. 





REPEATED TRIALS 


57 


2 . What is the probability for an event E to occur at least once, or twice, or three 
times, in a series of n independent trials with '^he probability p? Ans. 

(a) 1 - (1 - p)"; (b) 1 - (1 - + (n - Dpi; 

(c) 1 - (1 - p)""*j^l + (n - 2)p + — -- - 

3. WTiat is the probability of having 12 points with 2 dice at least three times in 

100 throws? Aim. 0.528. 

4. In a series of 100 independent trials with the probability what is the most 
probable number of successes and its probability? Ans. n - 33; Tss = 0.0844. 

Note: Log 100! = 157.97000; Log 67! = 94.56195; Log 33! = 36.93869. 

5. A player wins $1 if he throws heads two times in succession; otherwise he loses 
25 cents. If this game is repeated 100 times, what is the probability that neither his 
gain nor loss will exceed $1? Or $5? Ans. 



( 6 ) 


(a) 


_ioo^/iVY?v° 

20!80!\4/ \4/ 


0.0493; 


100! /iVY^VTl , ^ , ^*79 80 - 79 - 78 80 • 79 • 78 • 77 

20! 80!\4/ \4/ [ 63 63 66 63 • 66 • 69 63 • 66 • 69 • 72 


,60 60 - 57 60 - 57 54 60 - 57 - 54 - 51] 

81 81 • 82 81 • 82 • 83 81 • 82 - 83 - 84 J ~ * 


Note: Log 20! = 18.38612; Log 80! = 118.85473. 

6 . Show that in a series of 2« trials with the probability H tbe most probable num¬ 
ber of successes is s and the corresponding probability 


Show also that 


Hint: 


1 - 3 - 5 - - (2s - 1) 

2-4-6 - - 2« 


r. < 


T.<^- 


V2« + 1 

2-4-6 


• 2s 


5-7 - 


(2s + 1) 


7. Prove the following theorem: If P and P' arc probabilities of the most probable 
number of successes, respectively, in n and n -b 1 trials, then P' ^ P, the equality 
sign being excluded unless (n -}- l)p is an integer. 

8 . Show that the probability T#, corresponding to the most probable number r.t 
successes in n trials, is asymptotic to (2impg)"l^, that is, 

lim Tpi\^2impq * 1 as n -♦ ». 

9. When p = the following inequality holds for every m: 
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V^O. What is the probability of 215 successes in 1,000 trials if p == 

Ans. 0.0154. 

.'11. What is the probability that in 2,000 trials the number of successes will be 
’Contained between 460 and 540 (limits included) if p = Ans. 0.964. 

12. Two players A and B agree to play until one of them wins a certain number of 
games, the probabilities for A and B to win a single game being p and g = 1 — p. 
However, they are forced to quit when A has a games still to win, and B has 6 games. 
How should they divide their total stake to be fair? 

This problem is known as “probldme dc parties,” one of the first problems on 
probability discussed and solved by Fermat and Pascal in their correspondence. 

Solution 1. Let P denote the probability that A will win a remaining games before 
B can win b games, and let Q = 1 — P denote the probability for B to win h games 
before A wins a games. To be fair, the players must divide their common stake M in 
the ratio P:Q and leave the sum MP to A and the sum MQ to B. 

To find P, notice that A wins in the following mutually exclusive ways: 

a. If he wins in exactly a games; probability p®. 

a 

h. If he wins in exactly o -f 1 games; probability -p®g. 

o(a -h 1) 

c. If ho wins in exactly a + 2 games; probability — - — ~ —p“g*. 


n. If he wins in exactly a b — \ games; probability 


a(o -b 1) 


(a + h - 2) 


Consequently 
P 

and similarly 

Q 


p» 


1-2 3 


o a(a -b 1) , , 
1 + + 


(6 - 1 ) 




a(a + 1) • • • (o + 6 - 2) 


1 -2 


(6 - 1 ) 




, , 6 . K6 + 1) , , 

1 + jP 4- + 

1 . 


• + 


6(5 -f 1) •••(& + 


1 • 2 


(a - 1) 


-P* * • 


Show directly that P + Q 
dP dQ 

Hint: ^ = 0. 

dp dp 

Solution 2. The same problem can be solved in a different way. Whether A or B 
wins will be decided in not more than a -b 6 — 1 games. Now if the players continue 
to play until the number of games reaches the limit o + 6 — 1, the number of games 
won by A must be not less than a. And conversely, if this number is not less than a, A 
will win a games before B wins b games. Therefore, P is the probability that in 
o + & — 1 games A wins not less than a times, or 


(o -bh - 1) 1 
o!(b - 1)1 


P*9*"'j^ 


1 4. E 4. (fc - i)(6 -2) /py 
'^o + lg"'"(o + l)(o + 2)\q/ 

(b - l}(b - 2) • • 


2 • 1 


(o + l)(o + 2) 

Show directly that both expressions for P are identical. 


(a -b b - 1)\«. 


']■ 




REPEATED TRIALS 


Hint: Proceed as before. 

13. Prove the identity 

n . , n(n — 1) ... 

pn _pn Iq _j- ^ ~ p** V 4- 


n(n — 1) • • (w ~ A: -f 1) 


1 • • • A; 


— x)*dx 

— x)*dx 


Hint: Take derivatives with respect to p. 

14. A and R have, respectively, w + 1 and n coins. If they toss their coins 
simultaneously, what is the probability that (a) A will have more heads than B? 
(6) A and B will have an ecjual number of heads? (c) B will have more heads than A ? 

Solution, a. Let be the probability for A to have more heads than B. Thi.« 
probability can be expressed jis th<? double sum 




Considering the coefficient of P in 


(1 + O'*-"*! 


(-O' 


(1 -f 


^n+l '-n — ^'2n+l- 


1 2*" 1 
1=1 

h. AThc probability Qn for A and B to have an equal number of heads is 

n 

a = 0 

c. The probability Rn for B to have more heads than A is 

D * '^2nfl 


16. If each of n independent trials can result in one of the m incompatible events 
Elf Ei^ . . . En with the respective probabilities 

Pi, P2, . . . Pm; (pi 4- P2 4- • • • 4" p« = 1), 
show that the probability to have U events E\f h events Et, ... In events En where 
L 4" Aj 4“ * • • 4 - Aw = n, is given by 


AilAj! • • • Aw! 


P\"Pi • • • Pw-* 


PllM, . . . Ir 



CHAPTER IV 


PROBABILITIES OF HYPOTHESES AND BAYES’ THEOREM 

1. The nature of the problems with which we deal in this chapter may 
be illustrated by the following simple example: Urns 1 and 2 contain, 
respectively, 2 white and 3 black balls, and 4 white and 1 black balls. 
One of the urns is selected at random and one ball is drawn. It happens 
to be white. What is the probability that it came from the first urn? 
Before the ball was drawn and its color revealed, the probability that the 
first urn would be chosen had been 1/2; but the indication of the color 
of the ball that was drawn altered this probability. To find this new 
probability, the following artifice can be used: 

Imagine that balls from both urns are put together in a third urn. 
To distinguish their origin, balls from the first urn are marked with 1 
and those from the second urn are marked with 2. Since there are 6 
balls marked with 1 and the same number marked with 2, in taking one 
ball from the third urn we have equal chances to take one coming from 
either the first or the second urn, and the situation is exactly the same 
as if we chose one of the urns at random and drew one ball from it. 
If the ball drawn from the third urn happens to be white, this can happen 
in 2 4-4 = 6 equally likely cases. Only in 2 of these cases will the 
extracted ball have the mark 1. Hence, the probability that the white 
ball came from the first urn is % = 

The success of this artifice depends on the equality of the number of 
balls in both urns. It can be applied to the case of an unequal number 
of balls in the urns, but with some modifications; however, it seems 
preferable to follow a regular method for solving problems like the 
preceding one. 

2. The problem just solved is a particular case of the following funda¬ 
mental : 

Problem 1. An event A can occur only if one of the set of exhaustive 
and incompatible events 


Bif B2f ... Bn 

occurs. The probabilities of these events 


(BO, (B,), . . . (Bn) 

« 

corresponding to the total absence of any knowledge as to the occurrence 

60 
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or nonoccurrence of A, are known. Known also, are the conditional 
probabilities 

(A, Bi)\ f -= 1, 2, . . . n 

for A to occur, assuming the occurrence of J5*. How does the proba¬ 
bility of Bi change with the additional information that A has actually 
happened? 

Solution. The question amounts to finding the conditional proba¬ 
bility (^t, A). The probability of the compound event ABi can be 
presented in two forms 

{ABi) = (B.)(A, Bi) 

or 

{ABd = {A){Bi, A). 


Equating the right-hand members, we derive the following expression 
for the unknown probability (5,, A): 


(fii, A) 


{Bi){A , Bi) 

(A) 


Since the event A can materialize in the mutually exclusive forms 


ABi, AB2, . . . Ai^„, 

by applying the theorem of total probability, we get 

(A) = (^i)(A, BO + (BOCA, BO + • • • + (BO(A, BO- 
It suffices now to introduce this expression into the preceding formula for 
{B^ A) to get the final expression 

/ix /D — iBi){A, Bi) 

U; {Hi, A) ^ + . . . + ^bO{A, bo* 

This formula, when described in words, constitutes the so-called 
^‘Bayes' theorem.’^ However, it is hardly necessary to describe its 
content in words; symbols speak better for themselves. P'or that 
reason, we prefer to speak of Bayes^ formula rather than of Bayes’ 
theorem. Bayes' formula is also known as the ‘Tormula for probabilities 
of hypotheses." The reason for that name is that the events Bi, B 2 , . . . 
B„ may be considered as hypotheses to account for the occurrence of A. 
It is customary to speak of probabilities 

(BO, (BO, . . . (BO 

as a priori probabilities of hypotheses 

Bi, B 2 , . . . B„, 

while probabilities 

(B,, A); 1 - 1 , 2 , ...n 
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are called a posteriori probabilities of the same hypotheses. 

3. A few examples will help us to understand the meaning and the 
use of Bayes’ formula. 


Example 1. 


The contents of urns 1, 2, 3, are as follows: 

1 white, 2 black, 3 red balls 

2 white, 1 black, 1 red balls 
4 white, 5 black, 3 red balls 


One urn is chosen at random and two balls drawn. They happen to be white and red. 
What is the probability that they came from urn 2 or 3? 

Solution. The event A represents the fact that two balls taken from the selected 
um were of white and red color, respectively. To account for this fact, we have three 
hypotheses: The selected urn was 1 or 2 or 3. We shall represent these hypotheses in 
the order indicated by Bi, Bs, Bs. Since nothing distinguishes the urns, the probabili¬ 
ties of these hypotheses before anything was known about A are 

(Bi) = (B,) = (B.) = J. 


The probabilities of A, assuming these hypotheses, arc 

(A, B.) = i, (A. B,) = i, (A, B,) = A. 


It remains now to introduce these values into formula (1) to have a posteriori prob¬ 
abilities 


A) 
(B„ A) 


- -55 

ii+M + i-A 118 

___ _ 

il + M + IA 118 


and also, naturally, 

(B„ A) = 1 - (B„ .4) - (B„ A) = 


Example 2. It is known that an urn containing altogether 10 balls was filled in 
the following manner: A coin was tossed 10 times, and according as it showed heads 
or tails, one white or one black ball was put into the um. Balls are drawn from this 
urn one at a time, 10 times in succession (always being returned before the next draw¬ 
ing) and every one turns out to be white. What is the probability that the um con¬ 
tains nothing but white balls? 

Solution. The event A consists in the fact that in 10 independent trials with a 
definite but unknown probability, only white balls appear. To account for this fact, 
we have 10 hypotheses regarding the number of white balls in the urn; namely, that 
this number is either 1, or 2, or 3, . . .or 10. The a priori probability of the hypo¬ 
thesis Bi that there are cxsMstly i white balls in the um, according to the manner in 
which the urn was filled, is the same as the probability of having t heads in 10 throws; 
that is. 


m = 


10! 1 
t!(10 - t)!2»»’ 


t = 1, 2, ... 10. 


Granted the hypothesis B<, the probability of A is 

‘'‘•“-(ro)” 
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The problem requires us to find (Bio, A), The expression of this probability immedi¬ 
ately results from Bayes’ formula: 


(Bio, A) 



The denominator of this fraction is 


Hence 


14.247. 

(Bio, A) = 0.0702. 


This probability, although still small, is much greater than Ho 24 , the a priori prob¬ 
ability of having only white balls in the urn. 

If, instead of 10 drawings, m drawings have been made and at each drawing white 
balls appeared, the probability (Bio, A) would be given by 


(Bio, A) = — 




The denominator of this formula can be presented thus: 

10 


Now 


and so 


Hence 


"1^) ■ 

»-0 

■-is) 

24 4 )' 

t-O (-0 

(B„, ^) > (l +«“^) . 


This shows that with increasing w the probability (Bio, A) rapidly approaches 1. 
For instance, if m =» 100 


(Bio, il) > (1 -f- > (1.0000454)-*® > 0.99954. 

Thus, after 100 drawings producing only white balls, it is almost certain that the 
urn contains nothing but white balls—a conclusion which mere common sense would 
dictate. 

Example 8 . Two urns, 1 and 2, contain respectively 2 white and 1 black ball, 
and 1 white and 5 black balls. One ball is transferred from urn 1 to urn 2 and then 
one ball is drawn from the latter. It happens to be white. What is the probability 
that the transferred ball was black? 



64 INTRODUCTION TO MATHEMATICAL PROBABILITY [Chap. IV 


Solution. Here we have two hypotheses: that the transferred ball was black, 

and Bit that it was white. The a priori probabilities of these hypotheses are 

(B.) = h (Bi) = §• 

The probabilities of drawing a white ball from urn 2, granted that or B 2 is true, 
are: 

{At Bi) = h (A, Bi) = ^ 


The probability of Bi, after a white ball has been drawn from the second urn, 
results from Bayes’ formula: 


(Bi, A) 


\ h ^ 1 
1 • 4 + i • I 5‘ 


4. Problem 2. Retaining the notations, conditions, and data of 
Prob. 1, find the probability of materialization of another event C 
granted that A has actually occurred. Conditional probabilities 

(C, ABi); i = It 2, ... n 


are supposed to be known. 

Solution. Since the fact of the occurrence of A involves that of one, 
and only one, of the events 

Bit . . . Bnt 


the event C (granted the occurrence of yl) can materialize in the following 
mutually exclusive forms 

CBit CB^t . . . 

Consequently, the probability (C, A) which we are seeking is given by 
(C, A) = iPBit A) -f (CR 2 , i4) + • • • + (CBn, A). 

Applying the theorem of compound probability, we have 

{CBit A) = {Bit ^)(C, BiA) 

and 


(C, A) = {Bit A){Ct ABi) + (R2, A){Ct ABi) + • • • + 

(Rn, A){Ct ABn). 

It suffices now to substitute for 

{Bi, A) 

its expression given by Bayes^ formula, to find the final expression 


W 


(C, A) = 


X (B0(A, BiKC, AB,) 

X(Bi){A,Bi) 
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It may happen that the materialization of hypothesis Bi makes C 
independent of A ; then we have sim; ly 

(C, ABi) = (C, Bd 

and instead of formula (2), we have a simplified formula 


B,) 

(3) (C, A) = - 

Xm{A, B,) 

t = l 

The event C can be considered in regard to as a future event. For 
that reason formulas (2) and (3) express probabilities of future events. 
For better understanding of these commonly used technical terms, we 
shall consider a simple example. 

Example 4. From an urn containing 3 white and 5 black balls, 4 balls are trans^ 
ferred into an empty urn. From this urn 2 balls are taken and they both happen to 
be white. What is the probability that the third ball taken from the same urn, will 
be white? 

Solution, (a) Let ua suppose that the two balls drawn in the first place are returned 
to the second urn. Analyzing this problem, we distinguish first the following hypoth¬ 
eses concerning colors of the 4 balls transferred from the first urn. Among them, there 
are necessarily 2 white balls. Hence, there are only two possible hypotheses: 

Bi:2 white and 2 black balls; 

Bi'. 3 white and 1 black ball. 


A priori probabilities of these hypotheses are 


(B.) 

(Bi) 


Cl • Cl ^ 3 
' Ci 7 

Cl ■ Cl ^ J_ 

c; 14 ' 


The event A consists in the >vhite color of both balls drawn from the second urn 
The conditional probabilities (A, Bi) and {A, Bt) arc 

(A, Bx) = i; (A, B,) = 1. 

The future event C consists in the white color of the third ball. Since the 2 balls 
drawn at first are returned, C becomes independent of A as soon as it is known which 
one of the hypotheses has materialized. Hence 


(C, ABx) = (C, Bx) = i 
(C, AB,) = (C, BO = I 

Substituting these various numbers in formula (3), we find that 


^ I ’ I • ^ -f T^T ■ f • f ^ 1 • 

« *1 12 ’ 


(C, A) 
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(6) If the two balls drawn in the first place are not returned, we have 


(C, ABx) - 0, (C, AB,) - \ 


Then, making use of formula (2), 


1 

6 * 


6 . The following problem can easily be solved by direct application 
of Bayes' formula. 

Problem 3. A series of trials is performed, which, with certain 
additional data, would appear as independent trials in regard to an event 
E with a constant probability p. 

Lacking these data, all we know is that the unknown probability p 
must be one of the numbers 


Pi» P*, • • • P* 

and we can assume these values with the respective probabilities 

au ofi, . . . a*. 

In n trials the event E actually occurred m times. What is the proba¬ 
bility that p lies between the two given limits a and (0 ^ a < ^ 1), 

or else, what is the probability of the following inequalities: 

a ^ p? 

A particular case may illustrate the meaning of this problem. In a 
set of N urns, Nai urns have white balls in proportion pi to the total 
number of balls; Nat urns have white balls in proportion pt; . . . Nak 
urns have white balls in proportion p*. An urn is chosen at random and 
n drawings of one ball at a time are performed, the ball being returned 
each time before the next drawing so as to keep a constant proportion 
of white balls. It is found that altogether m white balls have appeared. 
What is the probability that one of the Nui urns with the proportion 
Pi of white balls was chosen? Evidently this is a particular case of the 
general problem, and here we possess knowledge of the necessary data, 
provided that the probability of selecting any one of the urns is the same. 

Solution. We distinguish k exhaustive and mutually exclusive 
hypotheses that the unknown probability is pi, or pj, . . . or p*. The 
a priori probabilities of these hypotheses are, respectively, oi, ot, . . . a*. 
Assuming the hypothesis p = p<, the probability of the event E occurring 
m times in n trials is 

Crp7(l - p.)-—. 

Now, after E has actually happened m times in n trials, the a pos¬ 
teriori probability of the hypothesis p =« p<, by virtue of Bayes' formula, 
will be 7 ^ 
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k 

- Pi)'-” 

t-1 

or, canceling 

a<p?(l “■ Pi)"-"* 

]^a<p7(l - Pi)"-"* 
i-l 


Now, applying the theorem of total probability, the probability P of the 
inequalities 


will be given by 


a ^ p ^ P 


(4) 


p _ Saip?(l - P i)'-” 

^ k 

^fliP^d - Pi)'-" 


where the summation in the numerator refers to all values of pi lying 
between a and /3, limits included. 

An important particular case arises when the set of hypothetical 
probabilities is 

1 2 - 
Pi = P2 = ‘ P* = 1 

and the a priori probabilities of these hypotheses are equal: 

• 1 

«l = «2 = • • • = Of* = 


'Phen the fraction \/k can be canceled in both numerator and denomina¬ 
tor. The final formula for the probability of the inequalities 


cx ^ p ^ P 

will be 

(5) p = Zp?(l - P0' ~" 

^p 7(1 - p,)'-” 

t -1 

summation in numerator being extended over all positive integers i 
satisfying the inequalities 

ka ^ i ^ kp. 

In the, limit, when k tends to infinity, the a priori probability of the 
inequalities 


a ^ p ^ 0 
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is given simply by the length — a of the interval (a, /3). The a pos¬ 
teriori probability of the same inequalities is obtained as the limit of 
expression (5). Now, as A; — » the sums 

^ - 1 ) I 

tend to the definite integrals 

— xY~'^dx and "" xY~'^dx. 

Therefore, in the limit, the a posteriori probability of the ineqxialities 




a ^ p ^ 


is expressed by the ratio of two definite integrals 


( 6 ) 


P == 


— xY~”"dx 

— xY~”'dx 


This formula leads to the following conclusion: When the unknown 
prohdbility p of an event E may have any value between 0 and 1 and the a 
priori probability of its being contained between limits a and p is p — a, 
then after n trials in which E occurred m timeSy the a posteriori probability 
of p being contained between a and P is given by formula (6). 

6. Problem 4. Assumptions and data being the same as in Prob. 3, 
find the probability that in n\ trials, following n trials, which produced 
E m times, the same event will occur Wi times. 

Solution. It suffices to take in formula (3) 

(Bi) = a<; (A, Bi) = C«pr(l - Pi)"-" 

and 

(c, Bi) = c»-pr'(i - 

to find for the required probability this expression; 

k 

(7) - 

2)oipj’(i - Pi)"-" 

Supposing agcun 

1 2 

Pi = jjj’ p« = • • • p* = 1 

, 1 

ai = a2 = ' ' ' ^ ajc — ^ 
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and letting k 
( 8 ) 


00 formula (7) in the limit becomes 


Q = c'; 


Q __ 

— xY~”*dx 


This formula leads to the following conclusion: When the unknown 
probability p of an event E may have any value between limits 0 and 1 
and the a priori probability of its being contained between a and jS is 
^ — a (so that equal probabilities correspond to intervals of equal length) ^ 
the probability that the event E will happen mi times in n\ trials following 
n trials which produced E m times is given by formula (8). 

In particular, for ni = mi = 1 (evaluating integrals by the known 
formula), we have 


m H- 
n -f 2 


This is the much disputed ^Maw of succession’^ established by Laplace. 

7. Bayes’ formula, and other conclusions derived from it, are neces¬ 
sary consequences of fundamental concepts and theorems of the theory of 
probability. Once we admit these fundamentals, we must admit Bayes’ 
formula and all that follows from it. 

But the question arises: When may the various results established 
in this chapter be legitimately applied? In general, they may be applied 
whenever all the conditions of their validity are fulfilled; and in some 
artificial theoretical problems like those considered in this chapter, they 
jLin question ably are legitimately applied. But in the case of practical 
applications it is not easy to make sure that all the conditions of validity 
are fulfilled, though there are some practical problems in which the use 
of Bayes’ formula is perfectly legitimate.^ In the history of probability 
it has happened that even the most illustrious men, like Laplace and 
Poisson, went farther than they were entitled to go and made free use 
principally of formulas (6) and (8) in various important practical prob¬ 
lems. Against the indiscriminate use of these formulas sharp objections 
have been raised by a number of authors, especially in modern times. 

The first objection is of a general nature and hits the very existence 
of a priori probabilities. If an urn is given to us and we know only that 
it contains white and black balls, it is evident that no means are available 
to estimate a priori probabilities of various hypotheses as to the propor¬ 
tion of white balls. Hence, critics say, a priori probabilities do not exist 
at all, and it is futile to attempt to apply Bayes’ formula to an urn with 
an unknown proportion of balls. At first this objection may appear 

* One such problem can be found in an excellent book by Tlyornton C. Fry, “Prob¬ 
ability and Its Engineering Uses,” New York, 1928. 
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very convincing, but its force is somewhat lessened by considering the 
peculiar mode of existence of mathematical objects. 

Some property of integers, unknown to me, is not present in my 
mind, but it is hardly permissible to say that it does not exist; for it does 
exist in the minds of those who discover this property and know how to 
prove it. 

Similarly, our urn might have been filled by some person, or selected 
from among urns with known contents. To this person the a priori 
probabilities of various proportions of white and black balls might 
have been known. To us they are unknown, but this should not prevent 
us from attributing to them some potential mode of existence at least as 
a sort of belief. 

To admit a belief in the existence of certain unknown numbers is 
common to all sciences where mathematical analysis is applied to the 
world of reality. If we are allowed to introduce the element of belief 
into such “ exact* ^ sciences as astronomy and physics, it would be only 
fair to admit it in practical applications of probability. 

The second and very serious objection is directed against the use of 
formula (6), and for similar reasons against formula (8). Imagine, 
again, that we are provided with an urn containing an enormous number 
of white and black balls in completely unknown proportion. Our aim 
is to find the probability that the proportion of white balls to the total 
number of balls is contained between two given limits. To that end, we 
make a long series of trials as described in Prob. 5 and find that actually 
in n trials, white balls appeared m times. The probability we seek would 
result from Bayes* formula, provided numerical values of a priori proba-. 
bilities, assumed on belief to be existent, were known. Lacking such 
knowledge, an arbitrary assumption is made, namely, that all the a 
priori probabilities have the same value. Then, on account of tte 
enormous number of balls in our urn, formula (6) can be used as an 
approximate expression of P. It can be shown that, given an arbitrary 
positive number e, however small, the probability of the inequalities 


m ^ ^ w I 

- € < p < -h€ 

n n 


can be made as near to 1 as we please by taking the number of trials 
greater than a certain number N{€) depending upon € alone. In other 
words, with practical certainty we can expect the proportion of white 
balls to the total number of balls in our um to be contained within 
arbitrarily narrow limits 

I 

m 1 , 

*-€ and-h €. 

n n 
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A conclusion like this would certainly be of the greatest importance. 
But it is vitiated by the arbitrary as&..mption made at the beginning. 
The same is true of formula (8) and of Laplace's “law of succession." 
The objection against using formulas (6) and (8) in circumstances where 
we are not entitled to use them appears to us as irrefutable, and the 
numerical applications made by Laplace and others cannot inspire much 
confidence. 

As an example of the extremes to which the illegitimate u.se of formulas 
(6) and (8) may lead, we quote from Laplace: 

En faisant, par exemple, remonter la plus ancienne 6poque de I’histoire h. 
cinq mille ans, ou k 1,826,213 jours, et le Soleil s’6tant lev6 constamment, dans 
cet intervalle, k chaque revolution de vingt-quatre heures, il y a 1,826,214 k parier 
centre un qu'il se levera encore demain. 

It appears strange that as great a man as Laplace could make such a 
statement in earnest. However, under proper conditions, it would 
not be so objectionable. If, from the enormous number JV + 1 of 
urns containing each N black and white balls in all possible proportions, 
one urn is taken and 1,826,213 balls are drawn and returned, and they 
all turn out to be white, then nobody can deny that there are very nearly 
1,826,214 chances against one that the next ball will also be white. 

Problems for Solution 

1. Three urns of the same appearance have the following proportions of white and 
black balls: 

Urn 1: 1 white, 2 black balls 
Urn 2: 2 white, 1 black ball 
Urn 3: 2 white, 2 black balls 

One of the urns is selected and one ball is drawn. It turns out to be white. What 
is the probability that the third urn was chosen? Ans. H- 

2. * Under the same conditions, what is the probability of drawing a white ball 

again, the first one not having been returned? Ans. 

3. An urn containing 5 balls has been filled up by taking 5 balls from another urn, 

which originally had 5 white and 5 black balls. A ball is taken from the first um, and 
it happens to be black. What is the probability of drawing a white ball from among 
the remaining 4? Ans. 

4. From an urn containing 6 white and 5 black balls, 5 balls arc transferred into an 

empty second urn. From there, 3 balls are transferred into an empty third urn and, 
finally, one ball is drawn from the latter. It turns out to be white. What is the 
probability that all 5 balls transferred from the first urn are white? Ans. H 26 - 

6. Conditions and notations being the same as in Prob. 3 (page 66), show that the 
probability for an event to occur in the (n -f- l)8t trial, granted that it has occurred 
in all the preceding n trials, is never less than the probability for the same event to 
occur in the nth trial, granting that it has occurred in the preceding n — 1 trials. 

Hint: it must be proved that 
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For that purpose, use Cauchy’s inequality 


( k y k k 


6 . Assuming that the unknown probability p of an event E can have any value 
between 0 and 1 and that the a priori probability of its being contained in the interval 
(a, fi) is equal to the length of this interval, prove the following theorem: The prob¬ 
ability a posteriori of the inequality 

p ^ O’ 

after E has occurred m times in n trials is equal to the probability of at least m I 
successes in n 4- 1 independent trials with constant probability <r. (See Prob. 13, 
page 69.) 

7. Assumptions being the same as in the preceding problem, find approximately 
the probability a posteriori of the inequalities 

^ P ^ 

it being known that in 200 trials an event with the probability p has occurred 105 
times. Ans. Using the preceding problem and applying Markoff’s method, we find 
P - 0.846. 

8 . An urn contains N white and black balls in unknown proportion. The number 
of white balls hypothetically may be 

0, 1, 2, ... Af 

and all these hypotheses are considered as equally likely. Altogether n balls are 
taken from the urn, m of which turned out to be white. Without returning these 
balls, a new group of ni balls is taken, and it is required to find the probability that 
among them there are mi white balls. Naturally, the total number of balls is so 
large as to have n + ni < AT. Ans. The required probability has the same expression 


0 

as in Prob. 4, page 69. 

Polynomials ordinarily called “Hermite’s polynomials,” 
covered by liaplace, are defined by 


although they were dis- 


The first four of them are 

Hi{y) » -y; H^iy) = y* - 1; /f,(y) = -y> -f 3y; 

They possess the remarkable property of orthogonality: 


6y* + a 


ni^{y)HMdy 


*n.(y)Hy 


while 
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Under very general conditions, a function /(y) defined in the interval (— «, + » ) 
can be represented by a series 

/(y) = Oo + (txHiiy) + as//i(y) + ♦ ' • 

where in general 




•• -Hi 

e V(y)H*(y)dy. 


Let 


and A* 


n a(l — a) 

provided 0 < a < 1. 

9. Prove the validity of the following expansion indicated by Ch. Jordan: 


(n + 1)1 
m!(n — m)!' 


_ 3j)»- 


-1- 
lir L « 


-2a 




AHi(y) + 


+ 2 

2n r- (lln 
2n(n 

for 0 ^ 35 ^ 1 where y is a new variable connected to x by the equation 


In + 6)a(l - ^ . 1 

:+2)^+3r- + • • • J 


Hint: Consider the development in a series of Hermite’s polynomials of the 
function 


f(y) ~ 0 ^ ^ 


J(y) 


if either 


y < —Aa 


^ A(1 ~ a) 
y > A(1 — a). 


10. Assuming that the conditions of validity of formula (6) are fulfilled, show that 
the a posteriori probability of the inequalities 


m a(l — a) m ja(l — a) 

- - ^ <p <-+U -; 

n \ n n \ n 


<p <■ 

1 

can be expanded into a convergent series 


m 

n 


2 r _ U ^ 2n - (lln + 6)a(l - a) 

■\/ZrJo* ^ ~ “) 


When n is large and a is not near either to 0 nor to 1, two terms of this series suffice 
to give a good approximation to P (Ch. Jordan). Apply this to Prob. 7. 

Ans. 0.84585. 
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CHAPTER V 


USE OF DIFFERENCE EQUATIONS IN SOLVING PROBLEMS 
OF PROBABILITY 

1 . The combined use of the theorems of total and compound proba¬ 
bility very often leads to an equation in finite differences which, together 
with the initial conditions supplied by a problem itself, serves to deter¬ 
mine an unknown probability. This method of attack is very powerful, 
and it is often resorted to, especially in the more difficult cases. In this 
chapter the use of equations in finite differences, applied to a few selected 
and comparatively easy examples, will be shown; but in Chap. VIII 
we shall apply the method to a class of interesting and historically 
important problems. 

Certain preliminary explanations are necessary at this point. Again 
we consider a series of trials resulting in an event E or its opposite, Fy 
but tliis time we suppose that the trials are dependent, so that the 
probability of E at a certain trial may vary according to the available 
information concerning the results of some of the other trials. 

A simple and interesting case of dependent trials arises if we suppose 
that the probability of E in the (n + l)st trial receives a definite value 
a if ^ has happened in the preceding nth trial, and this value does not 
change whatever further information we may possess concerning ^the 
results of trials preceding the nth. Also, the probability of E in the 
(n -h l)st trial receives another determined value ^ E failed in 
the nth trial, no matter what happened in the trials preceding the nth. 

We have a simple illustration of this kind of dependence, if we suppose 
that drawings are made from an urn containing black and white balls in 
a known proportion, and that each ball drawn is returned to the urn, but 
only after the next drawing has been made. It is obvious that the proba¬ 
bility that the (n -j- l)st ball drawn will be white, becomes perfectly 
definite if we know what was the color of the ball immediately preceding, 
and it remains the same no matter what we know about the colors of the 
1, 2, ... (n — l)st balls. 

If the trials depend on each other in the above-defined manner, we 
say that they constitute a **simple chain,” to use the terminology of the 
late A. A. Markoff, who was the first to make a profound study of 
dependent trials of this and similar, but more complicated, types. It is 
implied in the definition of a simple chain that it breaks into two sepa¬ 
rate parts as soon as the result of a certain trial becomes known. For 

74 
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instance, if the result of the fifth trial is known, trials 6 , 7, 8 , . . . become 
independent of trials 1, 2, 3, 4, and the chain breaks into two distinct 
parts: the trials preceding the fifth, and those following it. If the 
results of trials 1, 2, 3, ... (n — 1) remain unknown, the event E 
in the following nth trial has a certain probability which we shall denote 
by pn. Also, if it becomes known that E happened at trial fc, where 
A; < n — 1, the probability of E happening in the nth trial receives a 
different value, It is important to find means to determine the 

probability pn, the a priori probability of E in the nth trial when the 
results of the preceding trials remain unknown; as well as to determine 
the probability p^^ of E in the nth trial when we possess the positive 
information that E has materialized in the A;th(A; < n — 1) trial. 

2. Thus we are led to the following problem concerning simple chains 
of dependent trials: 

Problem 1. The initial probability pi of the event E in a simple 
chain of trials being known, find the probability pn of E in the nth trial 
when the results of the preceding trials remain completely unknown. 
Also, find the probability p^J^ of E in the nth trial when it is known that 
E has happened in the kth trial where A; < n — 1. 

Solution. In the nth trial the event E can happen either preceded 
by E in the (n — l)st trial, the probability of which is p^-i, or preceded 
by F in the (n — l)st trial, the probability of which is 1 — pn_i. By 
the theorem of compound probability, the probability of the succession 
EE is Pn-KXj while the probability of the succession FE is (1 — pn-i)/?. 
Hence, the total probability pn is 

(t) Pn = apn-l + /3(1 — p»-l) = (a — P)Pn-l + d- 

This is an ordinary equation in finite differences. It has a particular 
solution 


Pn — c = const. 


where c is determined by the equation 

c = (a — P)c + P, 


whence 


3 

® l + /3-a 


provided 1 + |3 — a 0.^ On the other hand, the corresponding 

‘If or a — 3 ^ I, we necessarily have a * 1, 3=0, which 

means that E must occur in all the trials if it actually occurs in the first trial, and 
never occurs if it does not actually occur at the putset. This case, as well as the other 
extreme case in which a ~ 3 » can therefore be excluded ai^not possessing real 
interest. 
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homogeneous equation 

y, = (a - 

has a general solution 

y. = C(a - 

involving an arbitrary constant C, Adding to it the previously found 
particular solution, we obtain the general solution of (1) in the form 

p. = C(<, - /9)-* + Y+^a 

The arbitrary constant C is determined by the initial condition 


so that finally 


C4- 


_ i _ 

1 -f iS - a 


Pi 


If 


Pn = 


1 + d - 


+ (p‘ - 1 + T- „)(“ - 


/»)■ 


Pi 


1 - a 


we see that pn does not depend on n and is constantly equal to pi. Be¬ 
cause we may exclude the cases « — /? = ! or a — = —1, so that 

a — is contained between — 1 and 1, we may conclude from the above 
expression that pn, if not a constant, at any rate tends to the limit 

1 -|-i3 - a 

as n increases indefinitely. 

As to pSf^ we find in a similar way that it satisfies the equation 
(2) p<« = apiJii + i8(l - pi^l,) 

of the same form as equation (1). But the initial condition in this 
case is pi^i = « because the probability of E happening in the (Jc -h l)st 
trial is a when it is known that E occurred in the preceding trial. The 
solution of (2) satisfying this initial condition is 


+ 


1 - 


1 - a ' 1 -f- 




As the second term in the right-hand member decreases with increas¬ 
ing n and finally becomes less than any given number, we see that the 
positive information concerning the result of the fcth trial has less and less 



;c. 31 USE OF DIFFERENCE EQUATIONS IN SOLVING PROBLEMS 77 


fluence on the probability of E in the following trials, and in remote 
ials this influence becomes quite insignificant. 


Example. An urn contains a white and b black balls, and a series of drawings of 
e ball at a time is made, the ball removed being returned to the um immediately 
ber the taking of the next following ball. What is the probability that the nth ball 
awn is white when: (a) nothing is known about the preceding drawings; (b) the kth 
,11 drawn is white? 


In this particular problem we have a 
A 


a — I 
o -}- b — 


a -b b - 1 


Pi 


a 

o -b b 


/3 

1 - a 


~r~L “ p»- 
o + b 


lUS 


p» = Pi = 


a 

a + h 


lat is, the probability for any ball drawn to be white is the same as that for the 
st ball, nothing being known about the results of the previous drawings. The 
pression for is, in this example, 


n(*) = . 


a + b 




(a -b b)(o -b b — 1)**"* 


>, for instance, if a 


1, b — 2, n — 5, fc 3, 

p(«) =, 1 _j. -A. 

3 3-2* 


1 

2' 


Le information that the third ball was white raises to ^ the probability that the fifth 
ill will be white; it would be without such information. 

3. The next problem chosen to illustrate the use of difference equa- 
ons is interesting in several respects. It was first propounded and 
)lved by de Moivre. 

Problem 2. In a series of independent trials, an event E has the 
instant probability p. If, in this series, E occurs at least r times in 
accession, we say that there is a run of r successes. What is the proba- 
ility of having a run of r successes in n trials, where naturally n > r? 

Solution. Let us denote by yn the unknown probability of a run of 
in n trials. In n + 1 trials the probability of a run of r will then be 

,+i. Now, a run of r in n + 1 trials can happen in two mutually 

Kclusive ways: first, if there is a run of r in the first n trials, and second, 
such a run can be obtained only in n + 1 trials. The probability of 
le first hypothesis is 2/„. To find the probability of the second hypothe- 
s, we observe that it requires the simultaneous realization of the follow- 
ag conditions: 

(a) There is no run of r in the first n — r trials, the probability of 

^hich is 1 — pn-r. (b) In the (n — r -f l)st trial, E does not occur, 
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the probability of which is g = 1 — p. (c) Finally, E occurs in the 
remaining r trials, the probability of which is p''. 

As (a), (6), (c) are independent events, their simultaneous mate¬ 
rialization has the probability 

(1 - yn~r)qp^. 

At the same time, this is the probability of the second hypothesis. 
Adding it to we must obtain the total probability 2/n+i. Thus 

(3) 1/n+l = 2/n + (1 - yn-r)p^q 

and this is an ordinary linear difference equation of the order r + 1. 
Together with the obvious initial conditions 

2/0 = 2/1 = • * • = 2/r-i = 0, 2/r = p^ 

it serves to determine yn completely for n = r -f 1, r + 2, . . . . For 
instance, taking n — vfe derive from (3) 

2/r+i = P" + p^g. 

Again, taking n = r -f 1, we obtain 

2/r+2 = p** + 2p^q 

and so forth. Although, proceeding thus, step by step, we can find the 
required probability 2/n for any given n, this method becomes very labori¬ 
ous for large n and does not supply us with information as to the behavior 
of 2/n for large n. It is preferable, therefore, to apply known methods of 
solution to equation (3). First we can obtain a homogeneous equation 
by introducing Zn = 1 — 2/n instead of 2/n. The resulting equation 'in 
Zn is 

(4) Z„+l - Zn + qp'Zn-r = 0 
and the corresponding initial conditions are: 

Zo = Zi = • • • = Zr_i = 1; Zr = 1 ~ p^ 

We could use the method of particular solutions as in the preceding 
problem, but it is more convenient to use the method of generating 
functions. The power series in { 

^({) = Zo + Zlt + + • * • 

is the so-called generating function of the sequence Zo, Zi, Z2, . . . . 
If we succeed in finding its sum as a definite function of {, the development 
of this function into power series will have precisely Zn as the coefficient 
of To obtain ^({) let us multiply both members of the preceding 
series by the polynomial 


1 - f + qp^i^K 
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The multiplication performed, we have 

(1 - f + = 20 + (2l - 2o)f + • • • + (Zr^l - + 

4- (Zr - Zr^l)^ + (Zr+l - 2r + + * • • . 

In the right-hand member the terms involving . . . have 

vanishing coefficients by virtue of equation (4); also z* — z*_i = 0 for 
A; = 1, 2, 3, ... r — 1, while 


so that 
and 


Zo == 1 and Zr — Zr-i = — p** 
(1 ~ f 4- = 1 - 


<p{i) = 


_1 - p^r 

1 - i 4- gp^i^^^' 


The generating function ^(f) thus is a rational function and can be 
developed into a power scries of ^ according to the known rules. The 
coefficient of gives the general expression for Zn. Without any dif¬ 
ficulty, we find the following expression for Zni 


(5) Zn = /3n.r “ P^Pn-r,r 

where 

n 

Z-0 

and Pn-nr is obtained by substituting n — r instead of n. If n is not very 
iarge compared with r, formula (5) can be used to compute Zn and 


Pn = 1 - Zn. 

For instance, if n = 20, r = 5, and p = g = we easily find 


Z 20 


4. M 

64 64* 64* 



L?+ 

64 ^ 64V 


and hence 


Zao = 0.76013 


correct to five decimals; y^o = 0.24987 is the probability of a run of 5 
heads in 20 tossings of a coin. 

4. But if n is large in comparison with r, formula (5) would require 
so much labor that it is preferable to seek for an approximate expression 
for Zn which will be useful for large values of n. It often happens, and 
in many branches of mathematics, but especially so in the theory of 
probability, that exact solutions of problems in certaip cases are not of 
any use. That raises the question of how to supplant them by con- 
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venient approximate formulas that readily yield the required numbers. 
Therefore, it is an important problem to find approximate formulas where 
exact ones cease to work. Owing to the general importance of approxi¬ 
mations, it will not be out of order to enter into a somewhat long and 
complicated investigation to obtain a workable approximate solution 
of our problem in the interesting case of a large n. 

Since ^({) is a rational function, the natural way to get an appropriate 
expression of 2n would be to resolve ^({) into simple fractions, correspond¬ 
ing to various roots of the denominator, and expand those fractions in 
power series of {. However, to attain definite conclusions following this 
method, we must first seek information concerning roots of the equation 

1 - { + = 0 . 

5. Let 

/({) = «- 1 - 

where 

a = p^(l - p). 

When p varies from 0 to 1, the maximum of p’'(l — p) is attained for 

p = —^ and is rV(r + 1)*“^^ so that a ^ r^/{r in all cases, 

r “h 1 

To deal with the most interesting case, we shall assume 


( 6 ) 

which involves 


V < 


r 

r+1 


“ ^ (r + 

and we leave it to the reader to discover how the following discussion 

T 

should be modified if p ^ 

r -h 1 

When i starts to increase from 0, the function /({) steadily increases 
and attains a positive maximum for f = fo where 

(r + l)aro = 1 

after which /({) decreases steadily to negative infinity. Hence, there 
are two positive roots of the equation /(f) = 0: fi, which is less than 
T ~f~ 1 

—-—) and another root greater than this number. This root is 1/p if 
condition (6) is fulfilled. 

The remaining roots are all imaginary if r is odd and there is one 
negative root among them if r is even. 

Now we shall prove that the absolute value of every imaginary or 
negative root is >l/p. Let p be the absolute value of any such root. 
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We have first 


/(p) = p ~ 1 ~ < 0 


so that p belongs either to the interval (0, {i) or to the interval (1/p, + «>), 
and if we can show that p > {o then p can be only > 1/p. If the root we 
consider is negative, p satisfies the equation 


F(p) = 1 + p — ofp’“^‘ = 0 


and since F{p) increases till a positive maximum for p = fo is reached, and 
then decreases, the root of F(p) = 0 is necessarily > fo- If f = is 
an imaginary root of /({) = 0 we have, equating imaginary parts, 


(7) 


ap' 


sin (r -f 1)0 __ 
sin 0 


But, whatever 0 may be 


sin (r -|- 1)6 
sin 0 


^ rV 1 


the equality sign being excluded if sin $ 0.^ Hence, 

(r -f l)ap’‘ > 1 

which implies p > {o. The statement is thus completely proved. 
6. The equation 

{ - 1 - = 0 


,can be exhibited in the form 

! + «{' = 1- 

Substituting { = pe'^ here, and again equating imaginary parts, we get 
ap^^^ sin rO =* sin 6 


and, combining this with (7), 

__ sin (r -f 1)^. __ (sin sin 0 

^ sin rO * ^ [sin (r -f 1)0)*^^ 


sin mO 

^The extreme values of the ratio —-(m integer >1) correspond to certain 

sin e 

roots of the equation m sin 6 cos mO » sin m0 cos 0, but for every root of this equation 

m 

— y .•"— ^ m 

V 1 + (w* — 1) sin* $ 

The equality sign is excluded if sin $ differs from 0. 


sin mO 
sin e 
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If the imaginary part of { is positive, the argument 6 is contained 


between 0 and tt. 

In this case, it cannot be less than —or greater 

r+1 

than IT- 

r + 1 

For, if 0 < 9 < —^ 

* r + 1 


sin rd sin (r + 1)0 


re ^ (r + 1)9 

or 

sin r9 ^ r 
sin (r + 1)0 r+1 

At the same time 

sin 0 ^ 1 

sin (r + 1)0 ^ r + 1 

and hence 



/ sin rO sin 0 ^ r** 

ot — 

\sin (r + l)0j sin (r + 1)0 ^ (r + 1)’’+^’ 


which is impossible. That 6 cannot be greater than v -follows 

r “h 1 

simply, because in this case, sin (r + 1)0 and sin rd would be of opposite 
signs and p would be negative. 

As —^ ^ 0 ^ IT-we have 

TT 

p Sin 0 > p sm —r-:=- ^ 

r -h 1 

On the other hand, sin x > 2x/ir if 0 < x < t/ 2 and p > 1/p, Hence, 

Thus, imaginary parts of all complex roots have the same lower bound 

2 

(r + i)p 

of their absolute values. 

7. Denoting the roots of the equation /({) = 0 by 
{*; (A = 1, 2, . . . r + 1) 

“ 2(1 - p)f*(r +1 - rfo(^ “ I) 


we have 
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Hence, expanding each term into power series of f and collecting 
coefficients of we find 


r + l 


*« = 2 


t-i 


1 - vik _ 

(1 - p){* ■ r + 1 - 




For every imaginary root, we have 


(1 - 


since 


U* 1 < p; 


(1 - p)it(r + 1 - r{t) 

1 


r+l 

Ki - pf 


- p 


< 2p; 


|r + 1 - r{i| 


< 


(r + l)p 
2r 


If r is odd, there are r — 1 imaginary roots and the part in the expression 
of Zn due to them in absolute value is less than 


r(l-p) P <l-pP 


The term corresponding to the root 1/p vanishes, so that finally 

2 , = _fl!_ + ’•_p-+2 

(1 - p){i r + 1 - rfi 1 - p^ 

where |0l < 1 and fi denotes the least positive root of the equation 

1 - f + = 0. 

0 

If r is even, there is one negative root. The part of Zn corresponding 
to this root is less than 

2pn+2 


(1 - P)r 

The whole contribution due to imaginary and negative roots is less than 


r(l -pr ' 1 - p*' 

in absolute value. Thus, no matter whether r is odd or even, we have 


( 8 ) 2 , 


1 - Pit 

(T - P)ii 


fr" 


r + 1 — r{i ^ — 


. p»+*; 


-1 < 9 < 1 . 


This is the required expression for z„, excellently adapted to the case of a 
large value for n, since then the remainder term involving 6 is completely 
negligible in comparison with the first principal term.' 
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The root can be found either by direct solution of the trinomial 
equation following Gauss’ method, or by application of Lagrange’s series. 
Applying Lagrange’s series, we have 

, , , , ^(ir + 2)ar + 3) • • • (Zr-f Z) , 

= 1 + a + 2i -n-“ 

t~2 

> , ^(Zr + l)(Zr 2) • • - (Zr + Z - 1) 

log = a + ^- p - a* 

/-2 

both series being convergent if |a| < r’‘/(r -f 1)''+^ and this condition is 
satisfied. 

8. Let us apply the approximate formula (8) to the case p — q — 
and r = 10. Using Lagrange’s series, we find that 

£i = 1.0004909 

and 

= 1.003947 • (1.0004909)-* + 

Hence, for n = 100, 1,000, 10,000, respectively, 

2n == 0.9559; 0.6146; 0.0074 

so that, for instance, the probabilities of a run of at least 10 heads in 
100, 1,000, or 10,000 throws of a coin are, respectively, 

0.0441; 0.3854; 0.9926. 

% 

Thus, in 10,000 throws, it is quite likely that heads would turn up 10 or 
more times in succession. 

In general, for a given r and increasing n, the probability tends to 1, 
so that in a very long series of trials, runs of any length are extremely 
likely to occur, a conclusion which at first sight seems paradoxical. 

9. In the preceding examples, an unknown probability was deter¬ 
mined by an ordinary equation in finite differences. Very often, how¬ 
ever, probability as a function of two or more independent variables is 
defined by a partial difference equation in two or more independent 
variables, together with a set of initial conditions suggested by the 
problem itself. A few examples will suffice to illustrate the use of 
partial equations in finite differences and to give an idea of the two 
principal methods for their solution; namely, Laplace’s method of 
generating functions, and the less well known, but elegant, method 
proposed by Lagrange. 

We start witb an analytical solution of the problem which was dis¬ 
cussed in detail in Chap. III. 
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Problem 3. Find the probability of exactly x successes in t inde¬ 
pendent trials with the constant probability p. 

Solution by Laplace’s Method. Let us denote the required proba¬ 
bility by yx.t- To obtain x successes in t trials can be possible only in 
two mutually exclusive ways: (a) by obtaining x successes int — 1 trials 
and a failure at the last trial; (h) by obtaining success at the last trial 
and X — 1 successes in the preceding t — I trials. The probability of 
case (a) is qy,,t-i and that of case (5) is pyx-u-u The total probability 
yx,t satisfies the equation 

(9) yx,t = py,-i,t-i -f- qyx.t -1 

for all positive x and t. This equation alone does not determine yx,t 
completely, but it does so in connection with certain initial conditions. 
These conditions are 

2/x.o = 0 if X > 0, 

( 10 ) 

2/0./ = g’ if f ^ 0. 

The first set of equations is obvious; the second set is the expression 
of the fact that if there are no successes in t trials, the failures occur t 
times in succession, and the probability for that is g^ 

Following Laplace, we introduce for a given t the generating function 
of 2 / 0 .2/i.«; 2 / 2 .«» • • • > is, the power series 

•o 

^<(f) = yo.< + y\,ti + P2.<f* + • • • = 

x-O 

Taking t — 1 instead of f, separating the first term and multiplying by 
g, we have 

g^/-i(€) = qyo.t-i 

X— 1 

and similarly 

« 

Pf^<-i(f) = ]^P2/x-.i,f-i{*. 

x-l 

Adding and noting equation (9) we obtain 

(p{ + g)^/-i({) == tptiO + qyo,t-i “• yo,h 

but because of (10) 

gi/o.t-i - yo,t = g* - g* = 0 


and hence. 


MO = (p£ + g)g><-i(£) 
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for every positive t. Taking t = 1,2, . . . and performing successive 

substitutions, we get 

= (pi + 

and it remains only to find 

¥>o(() = J/o.o + yi,oi + 2 / 2 ,+ * • • . 

But on account of (10), j/,,o = 0 for a; > 0, while yo.o = 1. Thus, 

Mi) = 1 

and 

Mi) = (pi + Q)^- 

To find yx.t it remains to develop the right-hand member in a power series 
of { and to find the coefficient of The binomial theorem readily gives 


yz.t = 


t(t - 1) 


1*2 




10. Poisson’s Series of Trials. The analytical method thus enables 
us to find the same expression for probabilities in a Bernoullian series 
of trials as that obtained in Chap. Ill by elementary means. Considering 
how simple it is to arrive at this expression, it may appear that a new 
deduction of a known result is not a great gain. But one must bear in 
mind that a little modification of the problem may bring new difficulties 
which may be more easily overcome by the new method than by a general¬ 
ization of the old one. Poisson substituted for the Bernoullian series 
another series of independent trials with probability varying from 
trial to trial, so that in trials 1 , 2 , 3 , 4 , . . . the same event E has different 
probabilities pi, pj, ps, p 4 , . . . and correspondingly, the opposite event 
has probabilities qi, q2, qtj q*, . • • where g* = 1 — p*, in general. Now, 
for the Poisson series, the same question may be asked: what is the 
probability yz,t of obtaining x successes in t trials? The solution of this 
generalized problem is easier and more elegant if we make use of differ¬ 
ence equations. 

First, in the same manner as before, we can establish the equation in 
finite differences 

(11) = p*y *-.1,1-1 + g<y*,f-i. 

The corresponding set of initial conditions is 

p,.o = 0 if X > 0 

(12) |/o.« = qiqt • • • qt if t>0 

2/0.0 = 1 . 

Giving Mi) the'same meaning as above, we have 
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z-l 

whence 

(p«£ + == ^<(f) + Qtyo.t-i *” 3/0,i; 

but because of (12) 

- yo.t = </ig 2 •••?<- = 0, 

and thus 

</»»(£) = (p<£ + Qt)<pt-i(0 

whence again 

<pt{i) = (pi£ + 9 i)(P2£ + 92 ) ■ * ‘ (p«£ + 9t)^(£)- 
However, by virtue of (12), ^(^) = 1 so that finally 

^<(f) = (pif 4- Qi)(p2i + ^ 2 ) • * * (p»f + ^»). 

To find the probability of x successes in t trials in Poisson’s case, one 
needs only to develop the product 

(Pif + ^?i)(p 2 £ + 92) • * • (pif -f Qi) 

according to ascending powers of f and to find the coefficient of {*. 

11. Solution by Lagrange’s Method. We shall now apply to equa¬ 
tion (9) the ingenious method devised by Lagrange, with a slight modifica¬ 
tion intended to bring into full light the fundamental idea underlying this 
method. Equation (9) possesses particular solutions of the form 

if a and P are connected by the equation 

a/3 = p + qa. 

Solving this equation for /8, we find infinitely many particular solutions 

«*(? + pa~0‘ 

where a is absolutely arbitrary. Multiplying this expression by an 
arbitrary function ^(a) and integrating between arbitrary limits, we 
obtain other solutions of equation (9). Now the question arises of how 
to choose <p{a) and the path of integration to satisfy not only equation (9) 
but also initial conditions (10). We shall assume that ^(a) is a regular 
function of a complex variable a in a ring between two concentric circles, 
with their center at the origin, and that it can therefore be represented in 
this ring by Laurent’s series 

^(«) = X 
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If c is a circle concentric with the regularity ring of (p{a) and situated 
inside it, the integral 





+ pa ^y<p{a)da 


is perfectly determined and represents a solution of (9). To satisfy 
the initial conditions, we have first the set of equations 



X = 1, 2, 3, . . . 


which show that all the coefficients Cn with negative subscripts vanish, 
and that ^(a) is regular about the origin. The second set of equations 
obtained by setting x = 0 


±J(g + pa-^y^da = 9 ' for < = 0 , 1 , 2 , . . . 

serves to determine ^(a). If c is a sufficiently small complex parameter, 
this set of equations is entirely equivalent to a single equation: 

1 r ip(a)da _ 1 

2x1 Jc« <(p 4- got) 1 — 

Now the integrand within the circle c has a single pole oo determined by 
the equation 

oo = €(p + goto) 


and the corresponding residue is 


1 - gt 


ri 


At the same time, this is the value of the left-hand member of the above 
equation, so that 

^(oto) ^ 1 

1 — g€ 1 — g€ 


or 


^(oo) = 1 


for all sufficiently small € or ao. That is, ip{a) = 1 and 


yz,t = 



is the required solution. It remains to find the residue of the integrand; 
that is, the coefficient of 1/a in the development of 




S»c. 121 USE OF DIFFERENCE EQUATIONS IN SOLVING PROBLEMS 89 


in series of ascending powers of a. That can be easily done, using the 
binomial development, and we obtain 

y.,i = 

as it should be. 

12. Problem 4. Two players, A and agree to play a series of 
games on the condition that A wins the series if he succeeds in winning a 
games before B wins h games. The probability of winning a single game 
is p for A and q = 1 — p for J5, so that each game must be won by either 
A or B. What is the probability that A will win the series? 

Solution. This historically important problem was proposed as an 
exercise (Prob. 12, page 58) with a brief indication of its solution based 
on elementary principles. To solve it analytically, let us denote by 
y,,* the probability that A will win when z games remain for him to win, 
while his adversary B has t games left to win. Considering the result 
of the game immediately following, we distinguish two alternatives: 
(a) A wins the next game (probability p) and has to win x — 1 games 
before B wins t games (probability Px-i.t); W A loses the next game 
(probability q) and has to win x games before B can win t — 1 games 
(probability The probabilities of these two alternatives being 

and qyx.t-i their sum is the total probability Thus, yx,t 

satisfies the equation 

(13) y,,t = pyx-1.4 + qyz,t-i- 

Now, y,,o = 0 for x > 0, which means that A cannot win, B having 
won all his games. Also, yo.< = 1 for t > 0, which means that A surely 
wins when he has no more games to win. The initial conditions in our 
problem are, therefore, 

y,.o = 0 if X > 0; 

(14) 

T/o.i = 1 if t > 0. 

The symbol yo.o has no meaning as a probability, and remains undefined. 
For the sake of simplicity we shall assume, however, that yo.o = 0. 
Application of Laplace’s Method. Again, let 

= y»,o -f" + • • • 

be the generating function of the sequence y,,o; y»,i; . . . cor¬ 

responding to an arbitrary x > 0. We have 

= X9y..i-i€‘ 

1-1 

m 

= pv—i.9 + %pyi-\.tt 
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and 


9(<px(.() + pv^id) = py.-i.o + ^ (pv^-u + 


or, because of (13), 

<li<px((;) + p^*-i({) = pyx-1,0 — yx,o + ^x(£)* 

Now, for every a; > 0 

Vx.o = 2/.-1.0 = 0 

in conformity with the first set of initial conditions, which allows us to 
present the preceding relation as follows: 


whence 

But 


^(f) = yo,o + Po.if + Po,j£* + •*• = { + £® + £*+‘ * = 


and finally 

£p* 

" (1 - 0(1 - gO’ 

It remains to develop the right-hand member in a power series of f and 
find the coefficient of £*, As 


and 


£ 

1 - 

1 ^ 

(1 - Qi)’ 


j = « + {’ + {*+•• • 


we readily get, multiplying these series according to the ordinary rules, 




, x(x + 1) 


l+^+ 


1-2 


g’ + 


+ 


x(x + 1) 


1-2 


{x + t- 2 ) ,_i1 

(«-l) ® J 


which coincides with the elementary solution indicated on page 58. 

Application of Lagrange’s Method. Equation (13) has particular 
solutions of the form 

where 

ai3 = p/3 -f qa. 
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Hence, we can either express a by Oi by a. Leaving it to the reader 
to follow the second alternative, we shall express^ a as a function of 
and seek the required solution in the form 

where tp{$) is again supposed to be developable in Laurent^s series in a 
certain ring; c is a circle described about the origin and entirely within 
that ring. Setting x = 0, we must have 

= 1 for t = 1, 2, 3, . . . 

and this set of equations is satisfied if we take 


V>(/S) = + ^ + 


Now we have 


and for t = 0 


- 1)^ 

_ p* p 


> 1 . 


~'da 

(/3 - 1 ) 


« - r 


da 

r^Ya(a - 1 ) 


= 0 


as it should be, because for \p\ > 1 the integrand can be developed into a 
power series of 1//3, the term with l/ff being absent. Thus, the required 
solution is given by 


yz,t 


p* r 

2xtJ (1 - - 1) 


where c is a circle of radius >1 described about the origin. The final 
expression for y, * is obtained as the coefficient of 1//3 in the development 
of 

(1 - - 1 ) 

into power series of 1/|3. We obtain the same expression as before. 


Problems for Solution 

1* Each of n urns contains a white and b black balls. One ball is transferred from 
the first urn into the second, another one from the second into the third, and so on. 
Finally, a ball is drawn from the nth urn. What is the probability that it is white, 
when it is known that the first ball transferred was white? 

Ant. + -^(« + «> + I)*'*- 
0+6 0+6 
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2 . Two urns contain^ respectively, a white and h black, and h white and a black 
balls. A series of drawings is made, according to the following rules: 

а. Each time only one ball is drawn and immediately returned to the same urn it 
came from. 

б. If the ball drawn is white, the next drawing is made from the first urn. 

c. If it is black, the next drawing is made from the second urn. 

d. The first ball drawn comes from the first urn. 

What is the probability that the nth ball drawn will be white? 


Am. pn 



8. Find the probability of a run of 5 in a series of 15 trials with constant prob¬ 
ability p yn = 23.3-‘ - 70.3-»* = 0.0314184. 

4. How many throws of a coin suffice to give a probability of more than 0.999 for 


a run of at least 100 heads? Am. 1.76 • 10” throws suffice. 

6. What is the least number of trials assuring a probability of ^ ^ for a run of at 
least 10 successes if p * g ** J^? Am. 1,420. 

6. Seven urns contain black and white balls in the following proportions: 






B 











n 



1 

2 

1 





One ball is drawn from each urn. What is the probability that there will be among 
them exactly 3 white balls? Ana. Coefficient of in. 

(H + !)(i{ + iKH -I- §)(if + im + f)(!f -f f)(ie H- 8) 

or 

If} = 0.28025. 

o 

7. Two players, each possessing $2, agree to play a series of games. The prob¬ 
ability of winning a single game is H for both, and the loser pays 81 to his adversary 
after each game. Find the probability for each one of them to be ruined at or before 
the nth game? 

Solution. Let ym be the probability that after playing 2m games, neither of the 
players is ruined. We have 

Vm+l ~ ^2/"* 

and hence 

1 


The probability for one of the players to be ruined at or before the nth game 1® 2 “ 2 *^ 

if n = 2m orn == 2m -h 1. 

8. Solve the same problem if each player enters the game with 83. 

An«. ifn*2m — lorn = 2m. 

9. Players Ai, Aj, . . . An+i play a series of games in the following order: first Ai 
plays with As; the loser is out and the winner plays with the following player. At; the 
loser is out again^and the next game is played with At, and so on; the loser always being 
out and his place taken by the next following player. The probability of winning a 








USE OF DIFFERENCE EQUATIONS IN SOLVING PROBLEMS 93 


single game is H each player and the series is won by the player who succeeds in 
winning over all his adversaries in succession. What is the probability that the 
series will stop exactly at the xth game? What is the probability that the series will 
stop before or at the xth game? 

Solution. Let y» be the probability that the series terminates exactly at the xth 
game. That means that the player who won the game entered at the (x — n + l)8t 
game and won successively the n following games. Now, there are n — 1 cases 
to be distinguished according as the player beaten at the (x — n -+• l)8t game has 
already won 1, 2, 3, . . . n — 1 games. Let p* be the probability that the loser in the 
(x — n -f* l)8t game previously has won k games. The probability of ending the 
series in this case is p*/2*. On the other hand, 


so that 




y*-k 


Pk ^ y*-k 
2 " 2 * ‘ 


Hence, for x > n 

1 1 1 
yx - 2 ^*- 1 + + * * * + 

Initial conditions: 

^ 1 

Vi = yj = • • • == Vn-i =0; yn = —• 


The generating function of y,: 


yi + yi{ + yii* + 



and the generating function of the probability that the series will end before or at the 
xth game is 



10. Three players, A, B, C, play a series of games, each game being won by one of 
them. If the probabilities for A, B, C to win a single game are p, y, r, find the prob¬ 
ability of A winning a games before B and C win b and c games, respectively. 

Solution. Let A*,v,, denote the probability for A to win the series when he has 
still to win X games, while B and C have to win y and z games, respectively. First, 
we can establish the equation 


Ax.v.m =* pA*_i,y,« QAx.v—l.a H" rAx,v.*-X« 

Next, Ao.v.i = 1 for positive y, z, and Ax,o,* = 0 for positive x, z; Ar^^.o = 0 for posi- 
tive X, y. Besides, although this is only a formal simplification, we shall assume 
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X,.*., - 0, X,,,,o »■ 0 when z or y or 2 vanishes. For the generating function of 




we find the equation 


whence 


The final answer is 


0x(€, i») 

0*(€» v) 


»,*-o 

p 


1 — 3^ - ^•i? 


1 ?) 


(1 - r ,)* (1 - {)(! - 


X..M ~ p*j^l + + r) + + r)* + . . j, 

the dash indicating that powers of q and r with the exponents ^6 and are omitted. 

Obviously, the same method can be extended to any number of players, and leads 
to a perfectly analogous expression of probability. 

11. An urn contains n balls altogether, and among them a white balls. In a series 
of drawings, each time one ball is drawn, whatever its color may be, it is replaced by 
a white ball. Find the probability y»,r that after r drawings there are x white balls 
in the urn. 

Solution. The required probability satisfies the equation 


y»,r+i 


n - X 4* 1 . X 

' --y»-I.r + -y».r. 

n n 


Besides, 


ya.o - 1, 


y*.o 


if 


X ^ a, 


y^,r =0 if X < o. 

From the preceding equation, combined with the initial conditions, we find suc¬ 
cessively I. 


'•■■-(O' 




2/a+3.r — 
and so on. 

12. If, in the problem of runs, p is supposed to be > 
ity of a run of r in n trials is greater than 


r 4-1 


prove that the probabil- 


1 




P - Pi 


y(p 4- Pi) 


(r 4- Dpi 


where pi < —is a root of the equation 


/I ~pi 


pI(i - pi) = p'd - p). 
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18. To find an asymptotic expression of probability for a run of r in n independent 
trials, if p ^ -t the following proposition is of importance: Imaginary and nega, 
tive roots of the equation 


(1 — — a; -h * = 0; 


0 < « ^ 


n 


n - 1 


are, in absolute value, greater than the root R > I of the equation 


(1 


— a)i2* — R 8 cos 


2ir 


0 . 


Prove the truth of this statement. 

14. Given 8 urns containing the same number n of black and white balls in known 
proportions, drawings are made in the following manner: first, a single ball is drawn 
out of every urn; second, the ball drawn from the first urn is placed into the second; 
that drawn from the second is placed in the third, and so on; finally, the ball drawn 
from the last urn is placed in the first, so that again every urn contains n balls. Sup¬ 
posing that this operation is repeated t times, find the probability of drawing a white 
ball from the a;th urn. 

Solution. Let be the required probability. First, it can be shown that it 
satisfies the equation 


y..< 


(l - 


+ 


1 

-yx-i.t-i. 

n 


The initial probabilities yi.o, yi.o, . . • y«.o are known; and, moreover, the function 
yx,t must satisfy a boundary condition of the periodic type, j/o.i = y$,t. Hence, 
applying Lagrange’s method, the following solution is found 




/(x) 4- 




1 • (n - !)• 


where 


/(x) = !/*.o when x > 0 


and the definition is extended to x ^ 0 by setting 


/(-x) =/(« - x). 

If, to begin with, all urns contain the same number of white and black balls, so that 
fix) = const. = p, we shall have, no matter what i is, 


y»,t 




= p. 
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CHAPTER VI 


BERNOULLI’S THEOREM 


1. This chapter will be devoted to one of the most important and 
beautiful theorems in the theory of probability, discovered by Jacob 
Bernoulli and published with a proof remarkably rigorous (save for some 
irrelevant limitations assumed in the proof) in his admirable posthumous 
book “Ars conjectandi’’ (1713). This book is the first attempt at scien¬ 
tific exposition of the theory of probability as a separate branch of 
mathematical science. 

If, in n trials, an event E occurs m times, the number m is called the 
“frequency” of .0 in n trials, and the ratio m/n receives the name of 
“relative frequency.” Bernoulli’s theorem reveals an important proba¬ 
bility relation between the relative frequency of E and its probability p. 

Bernoulli’s Theorem. With the probability approaching 1 or certainty 
08 near as we please, we may expect that the relative frequency of an event E 
in a series of independent trials with constant probability p will differ from 
that probability by less than any given number c > 0, provided the number 
of trials is taken sufficiently large. 

In other words, given two positive numbers c and ry, the probability 
P of the inequality 


will be greater than 1 — 17 if the number of trials is above a certain 
limit depending upon € and 17 . 

Proof. Several proofs of this important theorem are known which 
are shorter and simpler but less natural than Bernoulli’s original proof. 
It is his remarkable proof that we shall reproduce here in modernized 
form. 

o. Denoting by Tm, as usual, the probability of m successes in n trials, 
we shall show first that 


( 1 ) 


TfH-fc ^ T’g+fc 


if & > a and k > 0. Since the ratio 
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decreases as x increases we have fo»* 6 > a 


Tb+l ^ Tg+l 
Tb ^ Ta 


or 


Tb+i . ^ 
Ta^l Ta’ 


Changing 6, a, respectively, into 6 + 1, a -f 1; 6 + 2, a + 2; • • • 6 -f ifc, 
a + kj it follows from the last inequality that 


that is, 


^ 6 +A; . ^ 6+^-1 ^ 

m ^ rp ^ 

i a+k i a+k-l 


7p - < JjTf 

i 0+1 ^ a 


Tb+k ^ Ta+k 
Tb Ta 


h. Integers X and m being determined by the inequalities 
X--l<np^X, n— l<np-{-ne^^ 
the probabilities A and C of the inequalities 

_ . m . m . 

0^- p < €; - 

n ^ ' 11 ^ 


are represented, respectively, by the sums 

A = T\ A- T\+i -f- * * ‘ -h 
C — A- T^m+i + • ' * + T'n 

the first of which contains /x — X = ^ terms. Combining terms of the 
second sum into groups of g terms (the last group may consist of less than 
g terms) and setting for brevity 

Ai — A- T^+i -f- * ' ■ H" T^+ff-i 

A 2 = T^+o + Tfi+g+i 4- ' * ' -h T,i+2o-i 

^3 = Tn+2g 4“ T’,i4-2o+1 4- • • • 4- Tn+bg-l 


we shall have 


C = i4i 4~ -^2 4" 4" * * ’ 


and at the same time 


( 2 ) 

The ratio 


A 1 T^ A^ Tji 

A ^ Tx Ai Tx ' ’ ‘ 


-^1 _ Tx+g 4~ Tx+g+\ 4" • • • 4~ Tx+2o- \ 
A Tx 4" Tx+i 4* * * ' 4“ 
is less than the greatest of numbers 


Tx+g Tx +g+i ^ ^ ^ Tx+ 2 g-i 

Tx Tx+\ Tx-{o-\ 
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But by inequality (1) 

^ ^ ^ Tx+2a~l 

Tx ^ Tx+i ^ ^ Tx+,-1 


A ^ Tx 


Similarly, 


At . Tpi+g 


and again by inequality (1) 


Consequently 


T, Tx' 


Aj Tn 

X.< 3\’ 


As ^ Th^%q 

As ^ T,J 


Tn-^ig ^ T^+g 


As ^ Tn 


and inequalities (2) are established, 
c. For X ^ \ 


^ < 1' 


It suffices to show that 


As X ^ np 


Tx+i _ n - X p , 
Tx X + 1 g ^ ^• 

n - \ p ^ npq . 
X + 1 g - npq + g 


which shows that < 1. 

i X 

The inequality just established shows that in the following expression: 


Ijt = 

Tx Tft^i Th-2 


T,*—a-f-l ^ Tn—a 

T^^ * 


all the factors are <1. Consequently, if we retain a ^ g first factors 
only, replacing the others by 1, we get 


Til ^ Tn Tm —1 

Tx ~ T,-! ' Tm-2 


T,i_a+1 

~TZZ' 


- T i . < 1^=1 < 


Moreover, 
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whence the following important in* quality results: 


(3) 


Tn / n — /u 4- « pV 

Tx \n-aViq/’ 


Here a is an arbitrary positive integer ^ g. 

Now, let e be an arbitrary positive number, 
for 


(4) 

we have both 


n ^ 


Qf(l 4- c) — g 
«(P -f- «) 


Then we can show that 


(i) ^ ^ P < P 

/u — ”p4-€ 


and 


(ii) a ^ g. 


Since m ^ np 4- it suffices to show that (i) is satisfied for n = np + ru. 
li n — np + n€ inequality (i) is equivalent to 

nq — n€ a ^ q 
np4-ne-~a4-i ”p4-f 


or, after obvious simplifications, 

iu{p 4- ^ a(l - q. 

But this inequality follows from (4). To establish (ii), since a and g 
are integers, it suffices to show that a < gr 4- 1. But m ^ wp + we, 
X < np 4” 1 and consequently g 4- 1 > we. Hence (ii) will be estab¬ 
lished if we can show that ne ^ a which by virtue of (4) will be true if 

• a(l 4- €) - g ^ 

P + € 

that is, if 

a(l 4- e) — g ^ ap 4" ae 

or ag — g ^ 0 which is obviously true, a being a positive integer. 

d. The auxiliary integer a is still at our disposal. Given an arbitrary 
positive number t? < 1 we shall determine a as the least integer satisfying 
the inequality 



A.t the same time 
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and since log {l + - } > ——» we shall have 
V P/ P + < 


and 


< V 


€{p -f e) e- >7 e 


Consequently, if 
(5) 

then by virtue of (i) and (3) 


^ 1 + ei 1.1 

n ^ — 2 '~ log - -j- ~ 
rj e 


n 


<ri, 


and by virtue of ( 2 ) 

Ai < Arij An < A Iff < Ari^f A 3 < A 271 < Atf^f 

whence 


(6) C < Ar, A-Ar,‘‘+ + • • • = 

This inequality IioWkS if n satisfies (5). No trace of the auxiliary 
integer a is left. 

e. Let us now consider the inequalities 


— €<- p < 0 and-p S — € 

n n • 

and introduce their respective probabilities B and D. These inequalities 
are equivalent to 

^ — m ^ , n — m . 

n n 

It is apparent that we can interpret B or D as probabilities that the num¬ 
ber of occurrences m' = n — m of tlie event F opposite to E in n trials will 

7Tl^ TH/ 

satisfy either the inequality 0 < - — q < e or — 7 ^ e. Since 
the right'hand side of (5) contains only given numbers 6 , ri it is clear that 

if (5) is satisfied. 

Now -b B = P is the probability of the inequality 
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and C + D = Q is the probability of the opposite inequality 


\--plk e. 


Hence P + Q = 1. Moreover, by ( 6 ) and (7) 

Prt 


Q < 


1 - 


Consequently, 

or 

if only 


p + j^ > 1 

1-77 

P > 1 - 77 


^ 1 + 1,1 

n ^ ^ log - -f — 

(: rj € 


This completes the proof of Bernoulli’s theorem. 

For example, if p = 7 = and e = 0 . 01 , 77 — 0.001 we get from (5) 

n ^ 69,869 


which shows that in 69,869 trials or more there are at least 999 chances 
against 1 that the relative frequency will differ from by less than Koo- 
The number 69,869 found as a lower limit of the number of trials is 
much too large. A much smaller number of trials would suffice to fulfill 
all the requirements. From a practical standpoint, it is important to 
find as low a limit as possible for the necessary number of trials (given e 
amd 77). With this problem we shall deal in the next chapter. 

2 . Bernoulli’s theorem states that for arbitrarily given € and 77 there 
exists a number no(€, 77) such that for any single value n > no(€, 77) the 
probability of the inequality 


will be greater than 1 — 77. The question naturally arises, whether for 
given € and 77 a number N{€, 77) depending upon € and 77 can be found such 
that the probability of simultaneous inequalities 


m 

In 



< € 


for all n > N{€, 77) will still be greater than 1-77. The following theo¬ 
rem due to Cantelli shows that this question can be answered positively. 

Cantelli’s Theorem. For given €<1, 77<1 let N he an integer 
satisfying the inequality 

iV>§logl+2 
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The probability that the relative frequencies of an event E mil differ from 
p by less than e in the Nth and all the following trials is greater than 1 — 17 . 
Proof. We shall prove first that the probability Qn of the inequality 


will always be less than According to results proved in the 

preceding section for any 17 > 0 


if 


n > 


Qn < ri 

1 + c 


1 1.1 
2 log - + 


This inequality, if we take >) = becomes 

1 -I- « 


^ 1 + « ,1 
n > + - 


log 2 


and in this form it is evident, since for t < 1 


1 - log 2 < 1 - 2 log 2 < 0 . 

Hence, as stated, 

(8) Q« < 

The event A , in which we are interested, consists in simultaneous 
fulfillment of all the inequalities 


m 

n 



< € 


for n = N, N A- 1, N A- 2^ . . . . The opposite event B consists in 
the fulfillment of at least one of the inequalities 


m 

n 


P\ 


^ e 


where n can coincide either with iV, or with AT -f- l,orwithiV -f 2, . . . * 
The probability of B, which we shall denote by R, certainly does not 
exceed the sum of the probabilities of aU the inequalities 


m 

n 


P 


^ e 


forn = JV, iV -h 1, A + 2, . . . . 
Consequently, referring to ( 8 ), 


B < 2 ^ «■*"*’ = 

n-N 


2e-iw.> 

1 - e-.**' 
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rjlog 


1 - e-**' t* 


<Slog^ + 2. 


Consequently, if 




we shall have R < v and at the same time the probability of A will be 
greater than 1 — »?, which proves Cantclli’s theorem. 


Significance of Bernoulli’s Theorem 

3. As was indicated in the Introduction, one of the most important 
problems in the theory of probability consists in the discovery of cases 
where the probability is very near to 0 or, on the contrary, very near to 1, 
because cases with very small or very great” probability may have real 
practical interest. In Bernoulli’s theorem we have a case of this kind; 
the theorem shows that with the probability approaching as near to 1 
of certainty as we please, we may expect that in a sufficiently long 
series of independent trials with constant probability, the relative fre¬ 
quency of an event will differ from that probability by less than any 
specified number, no matter how small. But it lies in the nature of the 
idea of mathematical probability, that when it is near 1, or, on the con¬ 
trary, very small, we may consider an event with such probability as 
practically certain in the first case, and almost impossible in the second. 
The reason is purely empirical. 

To illustrate what we mean, let us consider an indefinite series of 
independent trials, in which the probability of a certain event remains 
constantly equal to It can be shown that if the number of trials 
is, for instance, 40,000 or more, we may expect with a probability > 0.999 
that the relative frequency of the event will differ from Yi by less than 
0.01. In other words, we are entitled to bet at least 999 against 1 that 
the actual number of occurrences will lie between the limits 0.49w and 
0.51n if w § 40,000. If we could make a positive statement of this 
kind without any mention of probability, we should be offering an ideal 
scientific prediction. However, our knowledge in this case is incomplete 
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and all we are entitled to state is this: we are more sure to be right in 
predicting the above limits for the number of occurrences than in expect¬ 
ing to draw a white ball from an um containing 999 white and only 1 
black ball. 

In practical matters, where our actions almost never can be directed 
with perfect confidence, even incomplete knowledge may be taken as a 
sure guide. Whoever has tried to win on a single ticket out of 10,000 
knows from experience that it is virtually impossible. Now the convic¬ 
tion of impossibility would be still greater if one tried to win on a single 
ticket out of 1,000,000. 

In the light of such examples, we understand what value may be 
attached to statements derived from Bernoulli’s theorem: Although the 
fact we expect is not bound to happen, the probability of its happening 
is so great that it may really be considered as certain. Once in a great 
while facts may happen contrary to our expectations, but such rare excep¬ 
tions cannot outweigh the advantages in everyday life of following the 
indications of Bernoulli’s theorem. And herein lies its immense practical 
value and the justification of a science like the theory of probability. 

It should, however, be borne in mind that little, if any, value can be 
attached to practical applications of Bernoulli’s theorem, unless the 
conditions presupposed in this theorem are at least approximately ful¬ 
filled: independence of trials and constant probability of an event for 
every trial. And in questions of application it is not easy to be sure 
whether one is entitled to make use of Bernoulli’s theorem; consequently, 
it is too often used illegitimately. 

It is easy to understand how essential it is to discover propositions 
of the same character under more general conditions, paying especial 
attention to the possible dependence of trials. There have been valuable 
achievements in this direction. In the proper place, we shall discuss the 
more important generalizations of Bernoulli’s theorem. 

4. When the probability of an event in a single experiment is known, 
Bernoulli’s theorem may serve as a guide to indicate approximately how 
often this event can be expected to occur if the same experiments are 
repeated a considerable number of times under nearly the same condi¬ 
tions. When, on the contrary, the probability of an event is unknown 
and the number of experiments is very large, the relative frequency of 
that event may be taken as an approximate value of its probability. 
Bernoulli himself, in establishing his theorem, had in mind the approxi¬ 
mate evaluation of unknown probabilities from repeated experiments. 
That is evident from his explanations preceding the statement of the 
theorem itself and its proof. Inasmuch as these explanations are interest¬ 
ing in themselves, and present the original thoughts of the great discov¬ 
erer, we deem it advisable here to give a free translation from Bernoulli’s 
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book. After calling attention to tlK fact that only in a few cases can 
probabilities be found a priori, Bernoulli proceeds as follows: 

So, for example, the number of cases for dice is known. Evidently there are 
as many cases for each die as there are faces, and all these cases have an equal 
chance to materialize. For, by virtue of the similitude of faces and the uniform 
distribution of weight in a die, there is no reason why one face should show up 
more readily than another, as there would be if the faces had a different shape 
or if one part of a die were made of heavier material than another. So one knows 
the number of cases when a white or a black ticket can be drawn from an urn, 
and besides, it is known that all these cases are equally possible, because the num¬ 
bers of tickets of both kinds are determined and known, and there is no apparent 
reason why one of these tickets could be drawn more readily than> any other. 
But, I ask you, who among mortals will ever be able to define as so many cases, 
the number, e.g., of the diseases wdiich invade innumerable parts of the human 
body at any age and can cause our death? And w'ho can say how much more 
easily one disease than another—plague than dropsy, dropsy than fever— can 
kill a man, to enable us to make conjectures about the future state of life or 
death? Who, again, can register the innumerable cases of changes to which the 
air is subject daily, to derive therefrom conjectures as to what will be its state 
after a month or even after a year? Again, who has sufficient knowledge of the 
nature of the human mind or of the admirable structure of our body to be able, 
in games depending on acuteness of mind or agility of body, to enumerate cases 
in which one or another of the participants will win? Since such and similar 
things depend upon completely hidden causes, which, besides, by reason of the 
innumerable variety of combinations will forever escape our efforts to detect 
them, it would plainly be an insane attempt to get any knowledge in this fashion. 

However, there is another way to obtain what we want. And what is impossi- 
bTe to get a priori, at least can be found a posteriori; that is, by registering the 
results of observations performed a great many times. Because it must be pre¬ 
sumed that something may occur or not occur as many times as it had previously 
been observed to occur or not occur under similar conditions. F'or instance, if, 
in the past, 800 men of the same age and physical build as Titus is now, were 
investigated, and it were found that 200 of them had died within a decade, the 
others continuing to enjoy life past this term, one co\ild pretty safely conclude 
that there are twice as many cases for Titus to pay his debt to nature within the 
next decade than to survive beyond this term. So it is, if somebody for many 
preceding years had observed the weatlier and noticed how many times it was 
fair or rainy; or if somebody attended games played by two persons a great many 
times and noticed how often one or the other won; by these very observations he 
would be able to discover the ratio of cases which in the future might favor the 
occurrence or failure of the same event under similar circumstances. 

And this empirical way of determining the number of cases by experiments is 
neither new nor unusual. For the author of the book “ Ars cogitandi,” a man 
of great acumen and ingenuity, in Chap. 12 recommends a similar procedure, 
and everybody does the same in daily practice. Moreover, it cannot be con¬ 
cealed that for reasoning in this fashion about some event, it is not sufficient to 
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make a few experiments, but a great quantity of experiments is required; because 
“^even the most stupid ones by some natural instinct and without any previous 
instruction (which is rather remarkable) know that the more experiments are 
made, the less is the danger to miss the scope. 

Although this is naturally known to anyone, the proof based on scientific 
principles is by no means trivial, and it is our duty now to explain it. However, 
I would consider it a small achievement if I could only prove what everybody 
knows anyway. There remains something else to be considered, which perhaps 
nobody has even thought of. Namely, it remains to inquire, whether by thus 
augmenting the number of experiments the probability of getting a genuine ratio 
between numbers of cases, in which some event may occur or fail, also augments 
itself in such a manner as finally to surpass any given degree of certitude; or 
whether the problem, so to speak, has its own asymptote; that is, there exists a 
degree of certitude which never can be surpassed no matter how the observations 
are multiplied; for instance, that it never is possible to have a probability greater 
than Ht Hi that the real ratio has been attained. To illustrate this by an 
example, suppose that, without your knowledge, 3,000 white stones and 2,000 
black stones are concealed in a certain urn, and you try to discover their numbers 
by drawing one stone after another (each time putting back the stone drawn 
before taking the next one, in order not to change the number of stones in the 
urn) and notice how often a white or a black stone appears. The question is, 
can you make so many drawings as to make it 10, or 100, or 1,000, etc., times 
more probable (that is, morally certain) that the ratio of frequencies of white and 
black stones will be 3 to 2, as is the case with the number of stones in the urn, 
than any other ratio different from that? If this were not true, I confess nothing 
would be left of our attempt to explore the number of cases by experiments. 
But if this can be attained and moral certitude can finally be acquired (how that 
can be done I shall show in the next chapter), we shall have cases enumerated a 
posteriori with almost the same confidence as if they were known a priori. And 
that, for practical purposes, where **morally certain” is taken for “absolutely 
certain” by Axiom 9, Chap. II, is abundantly sufficient to direct our conjectures 
in any contingent matter not less scientifically than in games of chance. 

For if instead of an urn we take the air or the human body, that contain in 
themselves sources of various changes or diseases as the urn contains stones, we 
shall be able in the same manner to determine by observations how much more 
likely one event is to happen than another in these subjects. 

To avoid misunderstanding, one must bear in mind that the ratio of cases 
which we want to determine by experiments should not be taken in the sense of a 
precise and indivisible ratio (for then just the contrary would happen, and the 
probability of attaining a true ratio would diminish with the increasing number of 
observations) but as an approximate one; that is, within two limits, which, 
however, can be taken as near as we wish to each other. For instance, if, in the 
case of the stones, we take pairs of ratios and or and 

etc., it can be shown that it will be more probable than any degree of 
probability that the ratio found in experiments will fall within these limits than 
outside of them, Such, therefore, is the problem which we have decided to 
publish here, now that we have struggled with it for about twenty years. The 
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novelty of this problem as well as its great utility, combined with equal difficulty, 
may add to the weight and value of other parts of this doctrine.—“Ars Conjec- 
tandi,” pars quarta. Cap. IV, pp. 224-227. 

Application to Games of Chance 

6 . One of the cases in which the conditions for application of Ber¬ 
noulli's theorem are fulfilled is that of games of chance. It is not out 
of place to discuss the question of the commercial values of games from 
the standpoint of Bernoulli's theorem. “Game of chance” is the term 
we apply to any enterprise which may give us profit or may cause us 
loss, depending on chance, the probabilities of gain or loss being known. 
The following considerations can be applied, therefore, to more serious 
questions and not only to games played for pastime or for the sake of 
gaining money, as in gambling. 

Suppose that, by the conditions of the game, a player can win a 
certain sum a of money, with the probability p; or can lose another 
sum h with the probability = 1 — p. 

If this game can be repeated any number of times under the same 
conditions, the question arises as to the probability for a player to gain 
or lose a sum of money not below a given limit. Let us denote by n 
the total number of games, and by m the number of times the player 
wins. Considering a loss as a negative gain, his total gain will be 

K = ma — (n — m)b. 

It is convenient to introduce instead of m another number a defined by 


• a ^ m — np 

and called “discrepancy.” Kxprcs.scd in terms of a the preceding expres¬ 
sion for the gain becomes 


The expression 


K = n(pa — qh) + (a b)a. 
E ~ pa — qb 


entering as the coefficient of n has, as we shall see, an important bearing 
on the conclusion as to the commercial value of the game. It is called the 
“mathematical expectation” of the player. Suppose at first that this 
expectation is positive. By Bernoulli’s theorem the probability for a 
discrepancy less than — ne, € being an arbitrary positive number, is 
smaller than any given number, provided, of course, the number of games 
is sufficiently large. At the same time, with the probability approaching 
1 as near as we please, we may expect the discrepancy to be ^ — we. 
However, if this is the case, the total gain will surpass the number 

n[E — €(a + 6)] 
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which, for sufficiently large n, itself is greater than any specified positive 
number. It is supposed, of course, that e is small enough to make the 
difference 

E — e^CL “h h) 

positive. And that means that the player whose mathematical expecta¬ 
tion is positive may expect with a probability approaching certainty as 
near as we please to gain an arbitrarily large amount of money if nothing 
prevents him from playing a sufficient number of games. 

On the contrary, by a similar argument, we can see that in case of 
a negative mathematical expectation, the player has an arbitrarily small 
probability to escape a loss of an arbitrarily large amount of money, 
again under the condition that he plays a sufficiently large number of 
games. 

Finally, if the mathematical expectation is 0, it is impossible to make 
any definite statement concerning the gain or loss by the player, except' 
that it is very unlikely that the amount of gain or loss will be considerable 
compared with the number of games. 

It follows from this discussion that the game is certainly favorable 
for the player if his mathematical expectation is positive, and unfavorable 
if it is negative. In case the mathematical expectation is 0, neither 
of the parties participating in the game has a decided advantage and then 
the game is called equitable. Usually, games serving as amusements are 
equitable. On the contrary, all of the games operated for commercial 
purposes by individuals or corporations are expressly made to be profita¬ 
ble for the administration; that is, the mathematical expectation of the 
administration of a game operated for lucrative purposes is positive at 
each single turn of the game and, correspondingly, the expectation of any 
gambler is negative. This confirms the common observation that those 
gamblers who extend their gambling over large numbers of games are 
almost inevitably ruined. At the same time, the theory agrees with 
the fact that great profits are derived by the administrations of gaming 
places. 

A good illustration is afforded by the French lottery mentioned on 
page 19, which, as is well known, was a very profitable enterprise operated 
by the French government. Now, if we consider the mathematical 
expectation of ticket holders in that lottery, we find that it was negative 
in all cases; namely, denoting by M the sum paid for tickets, we find the 
following expectations: 

On 1 ticket (}g — l)M — — JM, 

On 2 tickets — l)Af = — jjM, 

On 3 tickets “ 1)-^ = 


and so forth. 
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On the other hand, the expectation of the administration was always 
positive, and because of the great number of persons taking part in this 
lottery, the number of games played by the administration was enormous, 
and it was assured of a steady and considerable income. This was an 
enterprise avowedly operated for the purpose of gambling, but the same 
principles underlie the operations of institutions having great public 
value, such as insurance companies, which, to secure their income, always 
reserve certain advantages for themselves. 

Experimental Verification of Bernoulli’s Theorem 

6 . Bernoulli’s theorem, like any other mathematical proposition, is 
a deduction from ideal premises. To what extent these premises may be 
considered as a good approximation to reality can be decided only by 
experiments. Several experiments established for the purpose of testing 
various theoretical statements derived from general propositions of the 
theory of probability, are reported by different authors. Here we shall 
discuss those purporting to test Bernoulli’s theorem. 

I. Buffon, the French naturalist of the eighteenth century, tossed a 
coin 4,040 times and obtained 2,048 heads and 1,992 tails. Assuming 
that his coin was ideal, we have a probability of for either heads or 
tails. Now, the relative frequencies obtained by his experiments are: 

iUi = 0-507 for heads 
ifrH = 0.493 for tails 

and they differ very little from the corresponding probabilities, 0.500. 
In this case, the conclusions one might derive from Bernoulli’s theorem 
are verified in a very satisfactory manner. 

II. De Morgan, in his book Budget of Paradoxes” (1872), reports 
the results of four similar experiments. In each of them a coin was 
tossed 2,048 times and the observed frequencies of heads were, respec¬ 
tively, 1,061, 1,048, 1,017, 1,039. The relative frequencies corresponding 
to these numbers are 

iUi = 0.518; mi = 0.512; HU = 0.497; HH = 0.507. 

The agreement with the theory again is satisfactory. 

III. Charlier, in his book ** Grundzuge der mathematischen Statistik,” 
reports the results of 10,000 drawings of one playing card out of a full 
deck. Each card drawn was returned to the deck before the next draw¬ 
ing. The actual result of these experiments was that black cards 
appeared 4,933 times, and consequently the frequency of red cards was 
5,067. The relative frequencies in this instance are: 

tVt/W = 0.4933 for a black card 
WWir = 0.5067 for a red card 



110 INTRODUCTION TO MATHEMATICAL PROBABILITY (Chap. VI 


and they differ but slightly from the probability, 0.5000, that the card 
drawn will be black or white. The agreement between theory and experi¬ 
ment in this case, too, is satisfactory. 

IV. The author of this book made the following experiment with 
pla3ning cards: After excluding the 12 face cards from the pack, 4 cards 
were drawn at a time from the remaining 40, and the number of trials 
was carried to 7,000. The number of times in each thousand that the 
four cards belonged to different suits, was: 

I II III IV V VI VII 

113 113 103 105 105 118 108 

Altogether the frequency of such cases was 765 in 7,000 trials, whence 
we find for the relative frequency 

VWff = 0.1093 

while the probability for taking 4 cards belonging to different suits is 

mi = 0.1094. 

V. In J. L. Coolidge^s Introduction to Mathematical Probability,'' 
one finds a reference to an experiment made by Lieutenant R. S. Hoar, 
U.S.A., but the reported results are incomplete. The author of this book 
repeated the same experiment which consisted in 1,000 drawings of 5 cards 
at a time, from a full pack of 52 cards. The results were: 503 times the 
5 cards were each of different denominations; 436 times 2 were of the same 
denomination with 3 scattered; 45 times there were 2 pairs of 2 different 
denominations and 1 odd card; 14 times 3 were of the same denomination 
with 2 scattered; 2 times there were 2 of one denomination and 3 of 
another. The remaining possible combination, 4 cards of the same 
denomination with 1 odd, never appeared. The probabilities of these 
different cases are, respectively, 

mi = 0.507; mi = 0.423; = 0.048; 

tIIt “ 0.021; ” 0.001; “ 0.000. 

The corresponding theoretical frequencies are 507, 423, 48, 21, 1, 0, 
while the observed frequencies were 503, 436, 45, 14, 2, 0. The dis¬ 
crepancies are generally small and the greatest of them, 13, is still within 
reasonable limits. Deeper investigation shows that the probability that 
a discrepancy will not exceed 13 is about 3^; hence, the observed deviation 
of 13 units cannot be considered abnormal. 

VI. Bancroft H. Brown published, in the American Mathematical 
Monthly, vol. 26, page 351, the results of a series of 9,900 games of craps. 
This game is pjayed with two dice, and the caster wins unconditionally 
if he produces 7 or 11 points, which are called “naturals"; he loses the 
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game in case of 2, 3, or 12 points, called ‘‘craps.” But if he produces 
4, 5, 6, 8, 9, or 10 “points,” he does not win, but has the right to cast the 
dice an unlimited number of times until he throws the same number of 
points that he had before, or until he throws a 7. If he throws 7 before 
obtaining his point, he loses the game; otherwise he wins. 

It is a good exercise to find the probability of winning this game. 
It is 

m = 0.493 

that is, a little less than 3^. Multiplying the number of games, in our 
case 9,900, by this probability, we find that the theoretical number of 
successes is 4,880 and of failures, 5,020. Now, according to Bancroft H. 
Brown, the actual numbers of successes and losses are, respectively, 
4,871 and 5,029. The discrepancy 

4871 - 4880 = -9 

is extremely small, even smaller than could reasonably be expected. 
The same article gives the number of times “craps” were produced; 
namely, 2 appeared 259 times, 3 appeared 508 times, and 12 appeared 
293 times, making the total number of craps 1,060. The probability 
of obtaining craps is 


iAj + — i 

hence, the theoretical number of craps should be 1,100. The discrepancy, 
1060 — 1100 = —40, is more considerable this time but still lies within 
reasonable limits. 

VII. E. Czuber made a complete investigation of lotteries operated 
on the same plan as the French lottery, in Prague between 1754 and 1886, 
and in Briinn between 1771 and 1886. The number of drawings was 
2,854 in Prague and 2,703 in Briinn. The probability that in each draw¬ 
ing the sequence of numbers is either increasing or decreasing, is 

ijV = 0.01667 

while the observed relative frequency of such cases was 
Prague: 0.01612; Briinn: 0.01739 
and in both places combined 

0.01674. 

The probabilities that among five numbers in each drawing there is 
none or only one of the numbers 1, 2, 3, ... 9, are, respectively. 


0.58298 and 0.34070. 
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The corresponding relative frequencies were 

Prague: 0.58655 and 0.32656 
Briinn: 0.57899 and 0.34591 

and in both places combined 

0.58183 and 0.33587, respectively. 

The probability of drawing a determined number is g- Now, according 
to Czuber, for the lottery in Prague the actual number of occurrences for 
single tickets varied from 138 (for No. 6) to 189 (for No. 83), so that for 
all tickets the discrepancy varied from —20 to 31. Besides, there were 
only 16 numbers with a discrepancy greater than 15 in absolute value. 
All these results stand in good accord with the theory. 

VIII. One of the most striking experimental tests of Bernoulli’s 
theorem was made in connection with a problem considered for the first 
time by Bufifon. A board is ruled with a series of equidistant parallel 
lines, and a very fine needle, which is shorter than the distance between 
lines, is thrown at random on the board. Denoting by I the length of 
the needle and by h the distance between lines, the probability that the 
needle will intersect one of the lines (the other possibility is that the 
needle will be completely contained within the strip between two lines) is 
found to be 



The remarkable thing about this expression is that it contains the 
number x = 3.14159 • • • expressing the ratio of the circumference of a 
circle to its diameter. In the appendix we shall indicate how this expres¬ 
sion can be obtained, because in this problem we deal with a different 
concept of probability. 

Suppose we throw the needle a great many times and count the 
number of times it cuts the lines. By Bernoulli’s theorem we may expect 
that the relative frequency of intersections will not differ greatly from 
the theoretical probability, so that, equating them, we have the means of 
finding an approximate value of x. 

One series of experiments of this kind was performed by R. Wolf, 
astronomer in Zurich, between 1849 and 1853. In his experiments the 
width of the strips was 45 mm., and the length of the needle was 36 mm. 
Thus the theoretical probability of intersections is 
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The needle was thrown 5,000 times and it cut the lines 2,532 times; 
whence, the relative frequency 

UU = 0.5064. 

The agreement between the two numbers is very satisfactory. If, 
relying on Bernoulli’s theorem, we set the approximate equation 

4^ = 0.5064, 

45ir 

we should find the number 3.1596 for tt, which differs from the known 
value of X by less than 0.02. 

In another experiment of the same kind reported by De Morgan in 
the aforementioned book, Ambrose Smith in 1855 made 3,204 trials with 
a needle the length of which was of the distance between lines. There 
were 1,213 clear intersections, and 11 contacts on which it was difficult 
to decide. If on this ground, we should consider half of them as inter¬ 
sections, we should obtain about 1,218 Intersections in 3,204 trials, which 
would give the number 3.155 for x. If all of the contacts had been treated 
as intersections the result would have been 3.1412—very close to the 
real value of x. 

In an excellent book ^‘Calcolo delle Probability,” vol. 1, page 183, 
1925, by G. Castelnuovo, reference is made to experiments performed by 
Professor Reina under whose direction a needle of 3 cm. in length was 
thrown 2,520 times, the distance between lines being 6 cm. Taking into 
account the thickness of the needle, the probability of intersection was 
found to be 0.345, while actual experiments gave the relative frequency 
pf intersections as 0.341. 

Appendix 

Buff on’s Needle Problem. Let h be the width of the strip between 
two lines and I < h the length of the needle. The position of the needle 
can be determined by the distance x of its middle point from the nearest 
line and the acute angle ip formed by the needle and a perpendicular 
dropped from the middle point to the line. It is apparent that x may 
vary from 0 to /i/2 and p varies within the limits 0 and x/2. We cannot 
define in the usual way the probability of the needle cutting the line, for 
there are infinitely many cases with respect to the position of the needle. 
However, it is possible to treat this problem as the limiting case of 
another problem with a finite number of possible cases, where the usual 
definition of probability can be applied. 

Suppose that y/2 is divided into an arbitrary number m of equal 
parts 8 — h/2m and the right angle x/2 into n equal parts w = x/2n. 
Suppose, further, that the distance x may have only the values 

0, 6, 25, . . . m5 
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and the angle ^ the values 


This gives 


0, (t)y 2<t), . . . n<t). 


N - (m + l)(n + 1) 


cases as to the position of the needle, and it is reasonable to assume that 
these cases are equally likely. To find the number of favorable cases, we 
notice that the needle cuts one of the lines if x and (p satisfy the inequality 

^ I 

a; < 2 cos 

The number of favorable cases therefore, is equal to the number of 
systems of integers t, j satisfjdng the inequality 

(A) is < ~ cos jcj 

supposing that t may assume only the values 0 , 1 , 2 , ... w and j only 
the values 0 , 1 , 2 , . . . n. Because we suppose I <h the greatest 
value of i satisfying condition (A) is less than m and we can disregard 
the requirement that i should be Now for given j there are A; + 1 

values of i satisfying (A) if k denotes the greatest integer which is leas 
than 

lcosi«. 

In other words, k is an integer determined by the conditions 
k < ^cosju) ^ k + 1. 

The number of possible values for i corresponding to a given j can 
therefore be represented thus 

wi, ~ cos j09 + 

where tl, may depend on j but for all j is ^0 and < 1. Taking the sum 
of all the m, corresponding to j = 0 , 1 , 2 , . . . n, we obtain the number 
of favorable cases 


M = j^(l + cos 0 ) + cos 2« + 
Zo 


-f cos no)) + n0 


where 0 again is a number satisfying the inequalities 


0 ^ 0 < 1 . 
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But, as is well known, 

II 1 o 1 1 1 , sin (w 4- h)(ji 

1 + cos w 4- cos 2(*) 4" * * • 4“ cos no) = ^ H-^ 

2 Ck • ^ 

2 81112 

or, because to = ^ 

2 n 


1 4- cos o) -h cos 2o) 4- • * * 4- cos no) = ^ 4- ^ cot 


therefore 

Af = ^ cot I + 1 + ne. 

Dividing this by N (m 4- l)(n 4- 1) and substituting for 5 and o) 
their expressions 

t h TT 

we obtain the probability in the problem with a finite number of cases 


M I 


cot ~ , 

4n ^ I 


1 


+ 


n0 


N 2/i m 4- 1 n 4- 1 2^ m -jh 1 n 4- 1 (n + l)(m 4“ 1) 

The probability in Buffon’s problem will be obtained by making m 
and n increase indefinitely in the above expression. Now, since 


lim 


(m 4- l)(n 4- 1) 


lim 


= lim 


m 


m 4" 1 


= 1 , 


and 


we have 


(n 4- l)(m 4- 1) 


cot 


lim 


n 4" 1 


4n _ 4 


,. M 21 
N = hi- 


(m, n - 


Thus we arrive at the expression of probability 


in Buffon^s needle problem. 


21 
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Problems for Solution 

Another very simple proof of Bernoulli’s theorem, due to Tshebysheff (1821- 
1804), is based upon the following considerations: 

1 . Prove the following identities: 

n n 

^ - np) =0, 2) “ ^P)* * ^P5- 

wi*0 m“0 


Indication of the Proof. Differentiate the identity 


n 

g-npu(pg« 4- g)* = 

m *»0 

twice with respect to u and set u = 0 . 

2. If Q is the probability of the inequality \m — np| ^ ne prove that 


Q< 


Pi. 

n«* 


Indication of the Proof. In the identity 

n 

^ T„(m — np)* = npq 
m —O 

drop all the terms in which \m — np\ < nc and in the remaining terms replace 

(m — np)* 

by n*t*. The resulting inequality 


2^” 

—npl^nc 



is equivalent to the statement. 

3. Prove that 

P > 1 - 17 

if n > pqht*. 

Indication of the Proof. P = 1 — Q, Q < pqfne* and pqfn** < 17 if n > pqfrit*. 
The following two problems show how probability considerations can be used in 
proving purely analytical propositions. 

4 . S. Bernstein's Proof of Weierstrass' Theorem. The famous theorem due to Weier- 
strass states that for any continuous function/(a;) in a closed interval a ^ x ^ b there 
exists a polynomial P(x) such that 


\f{x) - P{X)\ < a 


for a ^ X ^ 6 where <r is an arbitrary positive number. By a proper linear trans¬ 
formation the interval (o, b) can be transformed into the interval ( 0 , 1 ). According 
to S. Bernstein, the polynomial 


p(»)= 

m—0 
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for sufficiently large n satisfies the inequality 

\f{x) - P(a;)l < <r 

uniformly in the interval 0 ^ x ^ 1 . 

Indication of the Proof. For x = 0 and x = 1 we have f(0) = P(0) and 

/(I) = P(l). 

It suffices to prove the statement for 0 < x < 1 . Let x be a constant probability in 
n independent trials. We have 


(a) 


/(X) 


- p(x) = 

m — O 



By the property of continuous functions, there is a number e corresponding to any 
positive number a such that 


l/(x') -f{x)\ 

whenever 


|x' -- x| < * (0 ^ X', X ^ 1). 

Also, there exists a number M such that |/(x)| ^ M for 0 ^ x ^ 1 . From equation 
(a) we get 


|/(x) - />(x)| £ + 2MR 


where P and R are, respectively, the probabilities of the inequalities 



m 

< € and 

m 


— — X 

— — X 


n 


n 

Now P < 1 and 


R < rf 



if n > 1 / 4 €*i 7 . Take 17 = afAM; then 


if 


|/(x) - P(x)| < 


6 . Show that 


n 



m 

P 


+« 

x’"(l - x)»“«dx 


> 1 


J['x"(l - x)*-"<Jx 


provided 0 < m < n and — —€>0, — +«<! (Castelnuovo). 
n n 
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Indication of the Proof. By Prob. 6, Chap. IV, page 72, the ratio 

m_ 

^ - x)^'-^dx 

x**(l — x)*~**dx 

represents the probability Q of at least m + 1 successes in a series of n -f 1 inde¬ 
pendent trials with constant probability 


Set 

whence 


But 


Hence 


m 

P = --r. 

m + 1 = (n + l)p + (n -h l)<r 


a 


n — m 
n(n + 1) 


+ e > i. 


Q < 


p(i ~ p) 
(n + l)cr« 


1 

4(n + l)e*‘ 


^ x"*(l - xY-^dx J 


and by a similar argument 




xY ^dx 


1 

4(n + !)«»' 


References 

Jacob Bernoulli: “Ars Conjectandi,” pars quarta, 1713. 

P. L. Tshebysheff: “Sur les valeurs moycnnes," Oeuvres, I, pp. 687-694. 

F. P. Cantelli: “Sulla probability come limite di frequenza,” Rend. d. R. Accad. 
Naz. dei Lincei^ 26, 1917. 

A Markoff: “Calculus of Probability," 4th Russian ed., Leningrad, 1924. 



CHAPTER VII 


APPROXIMATE EVALUATION OF PROBABILITIES IN 
BERNOULLIAN CASE 

1. In connection with Bernoulli's theorem, the following important 
question arises: when the number of trials is large, how can one find, at 
least approximately, the probability of the inequality 



where e is a given number? Or, in a more general form; How can one 
find, approximately, the probability of the inequalities 

I ^ m ^ V 

where I and V are given integers, the number of trials n being large? 

The exact formula for this probability is 

•-r 

p = 

where T,, as before, represents the probability of s successes in n trials. 
While this formula cannot be of any practical use when n and V — I 
are large numbers, yet it is precisely such cases that present the greatest 
tffeoretical and practical interest. Hence, the problem naturally arises 
of substituting for the exact expression of P an approximate formula 
which will be easy to use in practice and which, for large n, will give a 
sufficiently close approximation to P. De Moivre was the first suc¬ 
cessfully to attack this difficult problem. After him, in essentially the 
same way, but using more powerful analytical tools, Laplace succeeded 
in establishing a simple approximate formula which is given in all books 
on probability. 

When we use an approximate formula instead of an exact one, there 
is always this question to consider: How large is the committed error? 
If, as is usually done, this question is left unanswered, the derivation of 
Laplace’s formula becomes an easy matter. However, to estimate the 
error comparatively long and detailed investigation is required. Except 
for its length, this investigation is not very difficult. 

2. First we shall present the probability T, in a convenient analytical 
form. The identity 

Fit) « ipt + g)" = To + Tit + TxP + • • • "P Tnt* 

119 
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after substituting ( becomes 

F(e<») = r, + Tie>* + + • • • + T.e-**'. 

Multiplying it by e~*^ and integrating between —r and », we get 


j'e-^F{e^)d,p = 2itT, 

because for an integral exponent k 


Thus 



if 

if 


kT^O 

ifc = 0. 



and this is the expression for T, suitable for our purposes. To find the 
sum 

• -Z' 

p = Xt, 

• -< 

we observe first that 


«-r 


2 


= 




1 — e“^ 



On the other hand, the complex number F(e^) can be presented tin 
trigonometrical form, thus: 

F(e^) - 

whence 



or, because P is real. 



Finally, because P is an even function of tp and 0 is an odd one, we can 
extend the integration over the interval 0, v on the condition that we 
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double the result. Thus we obtain 




(v^- i + i\ 


It is convenient to introduce instead of I and V two numbers f i and 
defined by 

Z = np + i 4- Z' = np — J 4- 

where Bn = npq. Setting further 

0 = np^ 4- X, 

P can be presented as 

P = P, - Pi 

where Pi and P* are obtained by taking Z* = and f = f j in the integral 


=^r/ 

2rJo 


sin ~ X). 

sin iip ^ 


3. Our next aim is to establish upper and lower limits for R. 
Evidently 

P = (p* + ?* 4- 2pq cos (p)^ = ~ ipq sin* ' 

Now 

log p = i log - ipq sin* = -2pq sin* | - j(4p7)* sin< | - 


- |(4P9)’ sin* I - 


whence 


log P < -2pg sin* 


Since < t/2 , we have 


and consequently 


sin|>f 


log p < 


p < e '* 
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for all values of ip in the interval of integration. On the other hand, we 
have 


and 






< 24 


which gives another upper bound for p: 

/ox 

(3) 2*’+24'’_ 

The corresponding upper bounds for R will be 


(4) 

(5) 


R<e 

Bn ..Bn - 

R < e 2 - + 24 '’ 


To find a lower bound for R we shall assume ip ^ 7r/2. We can 
present log p thus: 

log p = - - |(4p9)* sin* I + 2pg|^|^ - sin* |j- - 


g(4pg)* sin' | - 


On the other hand, 

g(4pg)' sin' | + g(4pg)* sin' | + • • • < 
and 


g(4p9)’ sin' I j 


1 — ipq sin* ^ ® 


<o(4p9)’8in'5 


- sin* S > I sin' ^ 


so that 
<2 


*”1(1) -““’ll -5 Mw)'««‘5- ■ ■ • > 


- |(4p?)* sin' I = ^ sin' ||l - 32p*5* sin* || > 0 


and consequently 


logp > -^#>* - i(4p?)* sin' I > -^#)* - 
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if ^ ^ 2 * Hence, 


R > e ^ ^ 


and this is valid for (p ^ ir/2. 
4. Let T be defined by 


= 35r*. 


Assuming Bn ^ 25 from now on, we shall have, 


and a fortiori t < ir/2. Let us suppose now that <p varies in the interval 
0 ^ <p ^ T. By inequality (6) we shall have 

.R _ W - l) > - f > 


If? 4 
- 16^*'^ * 


because — 1> —a:fora;>0 and pq ^ 

On the other hand, using inequality (5), we find that 


R ~ e 2^"^’ < c 




BnT* 3 

le 24 = 1^8 < J. 


From the two inequalities just established it follows that 


in the interval 


\R - e 2”“^ I < ,\BnP^e ' ~^ 


0 ^ iP ^T. 


6. We turn now to the angle 0. Evidently 

_ ^ p sin 

0 = w arc tg —r-= ^ 

q p cos tp 


(t) = arc tg 


p s i n ip 
g -f P cos (p 


By successive derivations with respect to tp we find 
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^ a P* 4- pg cos ip , _ Pg(p — g) sin ^ 

dip p* -f 2pg cos ip -h g*' (p* + 2pg cos ^ -f- g*)* 

5^ - 4- (I - 2pg) cos - 2pg C 08 » y 

PVVP H) ^ 2pg cos ip + g*)* 

d*ii» __ / _ \ sin ^ — 1 -I- 4pg-h20p^g^4-8pg(l — 2pg) cos ip—Ap^q^ cos* ip] 

dip* ~ (p*4-2pg cos ^"hg*)* 

and for ^ » 0 

(^), “ fe)* “ ®' (£*)« “ 

Furthermore, one easily verifies that in the interval 0 ^ ^ ^ t/2 


kPo) 


\d^\ 


\dip^ 


'< gP9lP - «l(l - sin* 

2p?|p - ?|^1 - ipq sin* ^ v 


Hence, applying Taylor’s formula and supposing 0 ^ ^ ^ t, we get for x 


(8) X = iBn(p — g)^ + Mip^ 

where 


(9) \M\ < *^i*|p - g|(l - pgr*)*^, 


or 

(10) X = 
where 

(11) \L\ < ABnIp - g|(l - pgr*)-. 

Using inequalities (9) and (11), we easily find 

(12) sin {iy/Wnip - x) = sin {^y/Wnip) - \Bn{v “ g)^ cos {iy/Wnip) -f r 
where 


(13) \r\ < A5n|p - g|(l - pqT^)-^ip* + zUBKp - g)*(l - PTr^)~^<p\ 


provided 0 ^ ^ ^ r. 

6. To find an appropriate expression of the integral J we split it into 
two integrals, Ji and J%, taken respectively between limits 0, r and t, t. 
We have 
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because sin ^ Let ti = t then by inequality (4) 

Z T z 


2Bn , 

< r**i—= f" 

Jrt ^ Jr, <P J X/b; ^ 


But for ptositive x the following inequality holds: 


(14) 

consequently 


-“’dtt ^ c-** 


r 

J* M 

r'r>d«> ^ «-**•'* e-«v^* 

J,. "7 ^ 5.r> " ~3Br’ 


Noting that R{<p) is a decreasing function of (p we have for t ^ ^ n 

fi(^) g R{r) < |e-«v%. 


Hence, 


J'‘B^<|log^e-iv^, 


and combining this inequality with the one previously established, we 
have finally 


(15) 


W <(|log|+%^)e-'VK. 


^7. More elaborate considerations are necessary to separate the 
principal term and to estimate the error term in Ji. Making use of the 
inequality 


1_1 

Ism X xi 


6 sin X 


we can present Ji thus: 


sin (fVBiiv) - x) 


where 


|A|<— 

48*- sin -*^0 


dtp + A 


R(pd<Pf 


and, because R < in the interval 0 < <p <t 


1A|< 




32t sin ^ 
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Since t’ ^ we find by direct numerical calculation 

- - - < 0.0206, 

2)2ar sin ^ 


and so, finally. 


|A| < 0.0206B-‘. 


8. Referring now to inequality (7), we can write 


1 fa Bin (rv^y -_x)d^ = 2 sin (rv^y.- x)^ ^ ^ 

2irJo <P 27rJo fp 

where 

|Ail < = 0.04B;‘. 

Combining this with the result of the preceding section, we can present 
JI thus 

(16) Ji = ^ + A, 

and 

IA 2 I < 0.0605R“^ 

9. To simplify the integral in the right member of (16), we substitute 
for sin {^\/Bn<p — x) its expression (12). Taking into account inequal¬ 
ity (13), we get (17): 

A = I- - 

2jrJo ^ 2irJo <p 

n Bntp* _ 

Sir ~ Jo ^ ^ + As 

where 

lAsI < + 

+ - ?)*(! 

But 

j\ri‘>'*Vdv = 8B^’, 
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and so 

|A|| < “ 9l(l - + ^(P - 9)*(1 - PQr^)-‘B-\ 

Now ^ T* ^ B« ^ 25, consequently 

- p^.,- S < 0.0385. 

On the other hand, 

1 _ p^. ^ 1 _ I = g + ^(p - 5)., 

and for positive x the maximum of 

xHi + 

is attained for x* = ^J4st whence it follows that 

Taking into account all this, we have 

|A,| < 0.09|p - 

10. As to integrals in the right-hand member of (17) we can write 

(\?) I- f = |- f + A, 

^ 2irJo ^ ^Jo V 

(19) cos {S^/Wn<p)dv = 

= cos {iVB„<p)d^ + A, 

where 

|A4| < - 

Tjr V? 3ir 

and 

"c““*ti*dM < xe“*‘ 


because 
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for I > 1, as can easily be proved. Finally, taking into account (15), 
(16), (17), (18), (19), we get 


( 20 ) 


+ cos icVK^)dJ^ < 0 0e5 . + 0.09|p-g| ^ 


since for Bn ^ 25 


3 X 1 


It now remains to evaluate definite integrals in (20). We have 

(21) I f = 2 r 

2rJo <P 2irJo U 

(22) cos (^VB:v)dv> = 

- I C 21^2 cog 
nJO 


^ P -Q 

&trVK-i 


Differentiating the well-known integral 


X 


• 1 C 

COS bzdx = 2\/** (® > 0) 


twice with respect to h, and after that substituting o =* 5 = f, we 

find for (22) this expression: 

On the other hand, an integral of the type 


L(a) 


fVi-^du 

Jo « 


can be reduced to a so-called ** probability integral.’’ In fact, the 
derivation with respect to a gives 


L'ia) 

and since L(0) - q, 


J r« _o» 

0 c 2“ cos audu = J\/2x« ^ 
L(o) « iV2xJ^V*«*du. 
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Consequently, integral (21) can be reduced to 

—^ Ce-i'-'du. 

V^jo 

Having found an approximate expression of the integral J after sub¬ 
stituting in it ^2 and f i for f and taking the difference of the results, we 
find the desired expression of P. 

11. The result of this long and detailed investigation can be sum¬ 
marized as follows: 

Theorem. Let m he the number of occurrences of an event in a series 
of n independent trials with the constant probability p. The probability P 
of the inequalities 

np + i + f^ m ^ np - i -h f 2 \/^ 
where extreme members are integers^ can be represented in the form 

‘ (23) p = -^ rv'^du + - (1 - {•?)e'^]+c-. 

\/2irJf. 6V^^npq 

The error term w satisfies the inequality 

0.13 + 0.18|p-g| 

' ' npq 

provided npq ^ 25. 

By slightly increa.sing the limit of the error term, this theorem can 
be put into more convenient form. Let t\ and ti be two arbitrary real 
numbers and let P denote the probability of the inequalities 

• np -f tiy/npq ^ m ^ np + Uy/npq. 

If the greatest integers contained in 

np 4- Uy/npq and nq — Uy/npq 
are respectively, A 2 and Ai, the preceding inequalities are equivalent to 
n — Ai^m^A 2 . 


To apply the theorem, we set 


np — J -f f 2 \/ npq = A 2 = np -f- Uy/ npq — 
np -h 2 + fiVnp 9 = n — Ai = np 4- hy/rv^ + di 
62 and 01 being, respectively, the fractional parts of np 4- f 2 \/npq and 
nq — tiy/ npq. Hence, 


f2 = 4" 


Vrm 


f,»<, -1—i?- 

VnP9 



130 INTRODUCTION TO MATHEMATICAL PBOBABILITf [Chap. VII 


Applying Taylor’s formula, it is easy to verify that 




whence, finally, we can draw the following conclusion: For any two 
real numbers tu < 2 , the probability of the inequalities 

tiy/npq ^ in — np ^ Uy/npq 

can be expressed as follows: 

p - ■ + (* - - «■)»-" + 

V^Jci V2 x»P9 

+ 9^J=[(1 - iDe-K.* - (1 - f»)e-K.*] + a 


6 v 2 irnpg 

where B 2 and are the respective fractional parts of 

np + hy/npq and nq ~ hy/npq 

and 

mi < 0 20 + 0.25|p - gj 
' ' npq 

provided npq ^ 25. 

In particular, if = —<1 = i, the probability of the inequality 


is expressed by 


|m — np| ^ ty/npq 


Vi-’di* + ^ / * + Q 

I y/2iKwpq 


with the same upper limit for Q. Laplace, supposing that np + ty/npq 
is an integer in which case ^2 = 0 and is a fraction less than 
gives for P the approximate expression 


X-= rV»“’d,x n- A- 

v2irJo y/2fwnpq 

without indicating the limit of the error. Evidently Laplace^s formula 
coincides with the formula obtained here by a rigorous analysis, save for 
terms of the same order as the error term Q. 
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To find an approximate expression for the probability P of the 
inequality 


it suffices to take 


Then 



P = 



lu* 

2 


du -f 


1 — 

\/2Tnpg 


+ a 


and evidently P tends to 1 as n increases indefinitely. This is the second 
proof of Bernoulli\s theorem. 

Referring to the above expre.ssion for the probability of the inequalities 
Uy/npq ^ m — np ^ {%\/npq 


and .supposing that the number of trials n increases indefinitely while 
t\ and <2 remain fixed, we immediately perceive the truth of the following 
limit theorem: The probabiliiy of the inequalities 


tends to the limit 


as n tends to infinity. 

This limit theorem is a very particular ca.se of an extremely general 
theorem which we shall consider in Chap. XIV. 

12. To form an idea of the accuracy to be expected by using the 
foregoing approximate formulas, it is worth while to take up a few 
numerical examples. Let n = 200, p = g = H and 

95 ^ m ^ 105. 


Vnpq 

-4= fV»“'dw 


The exact expression of the probability that m will sati.sfy these ine¬ 
qualities is 


P = 


200 ! , 
1001100 ! 




+ 


100 • 99 


101 

100 • 99 • 98 • 97 


; + 


101 • 102 • 103 • 104 ^ 101 


99 • 98 
103 

100 • 99 • 98 • 97 • 
102 iW- 104 


JOO 
102 101 • U)2 

t r\r\ 

+ 


^)l 

losyj 
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The number in the brackets is found to be 9.995776 and it.s logarithm to 
five decimals 

0.99982. 


The logarithm of the first factor, again to five decimals, is 


whence 


2.75088, 

log P = T.75070; P = 0.56325, 


and this value may be regarded as correct to five decimals. Let us see 
now what result is obtained by using approximate formulas. In our 
example 


and 


iy/wpq = ty/^ = 5; 


< = -4= = 0.707107 

V2 


2 C* 

-4= \ e 0.52050. 


The additional term 


VlOOir 


0.04394 


and by Laplace’s formula 

P = 0.56444. 


This is greater than the true value of P by 0.00119. Now, the theoretical 
limit of the error is nearly 

= 0.004 

so that, actually, Laplace’s formula gives an even closer approximation 
than can be expected theoretically. 

When npq is large, the second term in Laplace’s formula ordinarily 
is omitted and the probability is computed by using a simpler expression; 

2 r* -- 

P = I e 

In our case this expression would give 

P = 0.52050 


instead of 0.56325 with the error about 0.043, which amounts to about 
8 per cent of the exact number. Such a comparatively large error is 
explained by the fact that in our example npq — 50 is not large enough. 
In practice, when npq attains a few hundreds, the simplified expression for 
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P can be used when an accuracy of about two or three decimals is con¬ 
sidered as satisfactory. In general, the larger t is, the better approxima¬ 
tion can be expected. 

For the second example, let us evaluate the probability that in 6,520 
trials the relative frequency of an event with the probability p = % 
will differ from that probability by less than € = To find f, we 

have the equation 

npq — en 

where 


n = 6520, 

which gives 


t 


and, correspondingly. 


P = I, 9 = 1, « = *, 


130.4 

\/l564.8 


3.2965, 


2 -- 

--4= \ e ^du = 0.999021. 

V^Jo 


Since m satisfies the inequalities 


3912 - 130.4 ^ m g 3912 -f 130.4 


the fractions 0i and 02 are 0i = 02 ^ 0.4 and the additional term is 


0.2 


_^~ 5.4334 


\73129.6ir 
Hence, the approximate value of P is 


0.000009. 


P = 0.999030. 


To judge what is the error, we can apply Markoff^s method of con¬ 
tinued fractions to find the limits between which P lies. These limits are 

0.999028 and 0.999044. 

The result obtained by using an approximate formula is unusually good, 
which can be explained by the fact that in our example t is a rather large 
number. Even the simplified formula gives 0.999021, very near the 
true value. 

Finally, let us apply our formulas to the solution of the inverse 
problem: How large should the number of trials be to secure a probability 
larger than a given fraction for the inequality 


m 

n 


^ €? 



134 INTRODUCTION TO MATHEMATICAL PROBABILITY [Chap. VH 


Let us take, for example, p = Hi ^ — 0.01 and the lower limit of proba¬ 
bility 0.999. To find n approximately, we first determine t by the 
equation 


which gives 


2 n 


^du 


0.999, 


( = 3.291. 


Hence, 

n = = ^ ^^ (3.291)^ = 24,066, approximately. 

€ y 


We cannot be sure that this limit is precise, since an approximate formula 
was used. But it can serve as an indication that for n exceeding this 
limit by a comparatively small amount, the probability in question will 
be >0.999. For instance, let us take n = 24,300. The limits for m 
being 

8,100 - 243 ^ m g 8,100 + 243, 
we find t from the equation 


e == 3.3068 

Ipq 


and correspondingly 


2 n 
V^Jo" 


2du = 0.999057. 


The additional term in Laplace's formula being 0.000023, we find 


P > 0.99908 - 0.00006 > 0.999. 


Thus, 24,300 trials surely satisfy all the requirements. 


Problems for Solution 

1. Find approximately the probability that the number of successes will be con¬ 

tained between 2,910 and 3,090 in 9,000 independent trials with constant probability 
yi. Ans. 0.9570 with an error in absolute value <10“< [using (23)1. 

2. In BufTon^s experiment a coin was tossed 4,040 times, with the result that heads 

turned up 2,048 times. What would be the probability of having more than 2,050 
or less than 1,990 heads? Ana. 0.337. 

8. R. Wolf threw a pair of dice 100,000 times and noted that 83,533 times the 
numbers of points on the two dice were different. What is the probability of having 
such an event occur not less than 83,533 and not more than 83,133 times? Does the 
result suggest a doubt that for each die the probability of any number of pointe was ^ ? 
Ana. This probability is approximately 0.0898 and on account of its smallness some 
doubt may exist.^ 
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4 . If the probability of an event E \s what number of trials guarantees a 
probability of more than 0.999 that the difference between the relative frequency of 
E and yi will be in absolute value less than 0.017 Ana. 27,500. 

6. If a man plays 10,000 equitable games, staking $1 in each game, what is the 
probability that the increase or decrease in his fortune will not exceed $20 or $507 

Ana. (a) 0.166; (6) 0.390. 

$. If a man plays 100,000 games of craps and stakes 50 cents in each game, what 
is the probability that he will lose less than $3007 Ana. About )^oo- 

7. Following the method developed in this chapter, prove the following formula 
for the probability of exactly m successes in n independent trials with constant 
probability p: 

V2Tnpq L eVnpq J 

where (is determined by the equation 


m = np + ty/npq 
and 

{npq)i 

provided npq ^ 25. 

8. Developments of this chapter can be greatly simplihed if p = 7 (sym¬ 
metrical case). In this case one can prove the following statement: The probability 
of the inequalities 




n 1 

^ m ^ - 

2 2 



can be expressed as follows: 






(.It* - - ({■■» - 


+ A 


where |A| < l/2n* for n > 16. 

9. In case of ‘‘rare” events, the probability p may be so small that even for a 
large number of trials the quantity X = np may be small; for example, 10 or less. 
In cases of this kind, approximation formulas of the type of Laplace’s cannot be used 
with confidence. To meet such cases, Poisson proposed approximate formulas of a 
different character. Ixit Pm represent the probability that in n trials an event with 
the probability p will occur not more than m times. Show that 




+ • 


1 • w 


-f A - Qm + A 


where 


and 


|Al < (c* - DQm if Qm ^ 4 
|A| < (e* - 1)(1 - <?m) if < J 


4 n 


X 


2(n - X) 



136 INTRODUCTION TO MATHEMATICAL PROBABILITY (Chap. VII 


Indication of the Proof. We have 


Pm - r 


1-1 



(i 

) 1 


, \ v' 

V »/ 

\ n 

/X*" 

^ + ,+ 12,.+ 

* T 

1.2.3- 

- • m 

9*". 


Now, since g * 1 — • 


(,. lY, iniV• - V "\ ■ 


and 

Consequently 

P, 

But 


^ A — * 

< e*-» S 


tly 

+ ... +——1 

Lll-2 1-2-3-mJ 


whence 

Pm < e*Qm; Qm » « 

On the other hand, 

n 

n(n - 1) • • • (n — M + 1) 


i-p. = 2 

M*m+1 


whence 

and 


qn-HpM 


<r 2 

M-m+l 


(-0(-a . 


1-2-3 


1 - P^ < e*(l - Q^) 


Pm > + 1 - eX. 

The final statement follows immediately from both inequalities obtained for P^^ 
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10. With the usual notation, show that 

r. - e-^G 

ml 

where 


mX (n —m)X« m(m--l) | 
g « e » 2n* 2n 




3(n — X)* 2n(n 
Indication of the Proof. Referring to Chap. I, page 23, we have 


< « < 1 . 


But 


whence 


(-;)■ 


(' -^r 


^ (n —m)X* 

2n« . 


< « 


mX (n —m)X* TO(m—1) 
r* < • e ” 2n* 2n . 

m! 


< e 


m(m— 1) 
2n , 


On the other hand, 




. I I (n —m)X* (n —m)X* 

n-X"^ 2(n-X)»" 3(n-X)* 


m(m — 1) 
^ g 2(n-in). 


Hence 


(-r(-r 

and a fortiori 


> « 


-x+ 


mX (n — m)X* Tn(m — 1) (n — m)X» m* 


2n» 


2n -e 3(n-X)» 2n(n-m), 


mX (n —m)X* m(m — 
> « ■*“ n 2n* 2n 


-fi _ 

[ 3(n - X)> ■ 


2n(n 


-m)J 


If X and m are both small in comparison to n the above-introduced factor Q will be 
near 1. Under such circumstances we may be entitled to use an approximate formula 
due to Poisson 
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The preceding elementary analysis gives means to estimate the error incurred by using 
this formula. 

11. Apply the preceding considerations to the case n - 1,000, p = Hoo» X * 10 
and m = 10. Ana. 0.1256 < 7'io < 0.1258. Poisson’s formula gives 0.1251—a 
very good approximation. Alo, 0.5807 < Pio < 0.5863. Taking Pio ■» 0.583, the 
error in absolute value will be less than 3.3 • 10“’. By a more elaborate method it is 
found Pio • 0.5830. 
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CHAPTER VIII 


FURTHER CONSIDERATIONS ON GAMES OF CHANCE 

1. When a person undertakes to play a very large number of games 
under theoretically identical conditions, the inference to be drawn from 
Bernoulli's theorem is that that person will almost certainly be ruined 
if the mathematical expectation of his gain in a single game is negative. 
In case of a positive expectation, on the other hand, he is very likely to 
win as large a sum as he likes in a sufficiently long series of games. 
Finally, in an equitable game when the mathematical expectation of a 
gain is zero, the only inference to be drawn from Bernoulli’s theorem is 
that his gain or loss will likely be small in comparison with the number of 
games played. 

These conclusions are appropriate however, only if it is possible to 
continue the series of games indefinitely, with an agreement to postpone 
the final settling of accounts until the end of the series. But if the 
settlement, as in ordinary gambling, is made at the end of each game, 
it may happen that even playing a profitable game one will lose all his 
money and will have to discontinue playing long before the number of 
games becomes large enough to enable him to realize the advantages 
which continuation of the games would bring to him. 

* A whole series of new problems arises in this connection, known as 
problems on the duration of play or ruin of gamblers. Since the science 
of probability had its humble origin in computing chances of players in 
different games, the important question of the ruin of gamblers was 
discussed at a very early stage in the historical development of the 
theory of probability. The simplest problem of this kind was solved by 
Huygens, who in this field had such great successors as de Moivre, 
Lagrange, and Laplace. 

2. It is natural to attack the problem first in its simplest aspect, and 
then to proceed to more involved and difficult questions. 

Problem 1. Two players A and B play a series of games, the proba¬ 
bility of winning a single game being p for A and q for R, and each game 
ends with a loss for one of them. If the loser after each game gives his 
adversary an amount representing a unit of money and the fortunes of 
A and B are measured by the whole numbers a and 5, what is the proba¬ 
bility that A (or B) will be ruined if no limit is set for the number of 
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Solution. It is necessary first to show how we can attach a definite 
numerical value to the probability of the ruin of A if no limit is set for 
the number of games. As in many similar cases (see, for instance, Prob. 
15, page 41) we start by supposing that a limit is set. Let n be this 
limit. There is only a finite number of mutually exclusive ways in which 
A can be ruined in n games; either he can be ruined just after the first 
game, or just after the second, and so on. Denoting by pi, p 2 , . . . p« 
the probabilities for A to be ruined just after the first, second, . . . nth 
game, the probability of his ruin before or at the nth game is 

Pi + pi "f * * * 4“ Pn. 

Now, this sum being a probability, must remain <1 whatever n is. 
On the other hand, each term of this sum is ^0 for the same reason. 
Both remarks combined, show that the series 

Pi 4* P2 + Ps 4- • * * 

is convergent. We take its sum as the probability for A to be ruined 
when nothing limits the number of games played. So it is clear that 
this probability, although unknown, possesses a perfectly determined 
numerical value. Let us denote by y, the probability for A to be ruined 
when his fortune is x. The probability we seek is Pa. Obviously, 

(1) yo = 1, 

for A is certainly ruined if he has no money left. Similarly 

( 2 ) = 0 

because if the fortune of A is a 4 - 5, it means that B has no money where¬ 
with to play, and certainly the ruin of A is then impossible. Further, 
considering the result of the game immediately following the situation 
in which the fortune of A amounted to z it is possible to establish an 
equation in finite differences which y, satisfies. For, if A wins this game 
(the probability of which case is p), his fortune becomes x + 1 and the 
probability of being ruined later is y^^i. By the theorem of compound 
probability, the probability of this case is py.+i. But if A loses (the 
probability of which is 9 ), his fortune becomes x — 1 and the probability 
that the one possessing this fortune will be ruined is The proba¬ 
bility of this case is Now, applying the theorem of total proba¬ 

bility, we arrive at the equation 

(3) y, = py*+i 4- gy*-i. 

This equation has a particular solution of the form a* where a is a 
root of the equation 


d « pa* 4- 
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IS p ^ q there are two roots 


P 

and, correspondingly, there are two distinct particular solutions of 
equation (3); 


Obviously, 


1 and 


= C + 


is also a solution of (3) for arbitrary C and D. Now, we can dispose of 
C and D so as to satisfy conditions (1) and (2). To this end we have the 
equations 

C + D = 1 

^ ga+bD = 0 , 


whence 


and 


C = 


^_ r 


^+6 


r 


l+i __ pA-^b^ 


D = - 


pa+ 


_ p«+6 


2/* = 


qa+bpx __ pa+bqx 


It remains to take x = a to obtain the required probability 
^ - p*) ^ g«(p> - q*) 

g^'^b __ p(i-i-b 


that the player A possessing the fortune a will be ruined. Similarly, 
the probability of the ruin of B is 

^ p»(p* - q‘) 

* p«+fc — ^+6 

It turns out that 

ya + 26 = 1, 


so that the probability that the series of games will continue indefinitely 
without A or B being ruined, is 0. The probability 0 does not show the 
impossibility of an eternal game, because this number was obtained, 
not by direct enumeration of cases, but by passage to the limit. Theo¬ 
retically, an eternal game is not excluded. Actually, of course, this 
possibility can be disregarded. * 
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If p = ^ == 80 that each single game is equitable, the preceding 

solution must be modified. In this case, the above quadratic equation 
in a has two coincident roots = 1, and we have only one particular 
solution of (3), y, = 1. But another particular solution in this case is 
Xj so that we can assume 

y* = C + 

and determine C and D from the equations 

C == 1 j C D{(i “h h) =0. 

Thus, we find that 

and for a; = a 

h 

Similarly, giving Zh the same meaning as above, 

a 

"‘ = 7 + 6 ' 

If, therefore, each single game is equitable, the probabilities of ruin are 
inversely proportional to the fortunes of the players. The practical 
conclusion to be derived from this theoretical result is sheer common 
sense: It is unwise to play indefinitely with an adversary whose fortune 
is very large without submitting oneself to the great risk of losing all 
one^s money in the course of the games, even if each single game is 
equitable. Gamblers who gamble at an even game with any willing 
individual are in the same condition as if they were gambling with an 
infinitely rich adversary. Their ruin in the long run is practically 
certain. 

If single games of the series are not equitable, that is, p 9 the 
conclusion may be different. Supposing p > 9 , we have a case when 
the expectation of A is positive; in each single game, A has an advantage 
over his adversary. The above expression for ya may be written in the 
form 



and, because q/p < 1, it is easy to see that pa remains always less than 
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and converges to this number when h becomes infinite. Thus, playing a 
series of advantageous games even against an infinitely rich adversary, 
the probability of escaping ruin is 



If a is large enough, this can be made as near 1 as we please, so that a 
player with a large fortune has good reason to believe that in the course 
of the games he will never be ruined, but that actually he is very likely 
to win a large sum of money. 

This conclusion again is confirmed by experience. Big gambling 
institutions, like the Casino at Monte Carlo, always reserve certain 
advantages to themselves, and, although they are willing to play with 
practically everybody (as if they played against an infinitely rich adver¬ 
sary) the chance of their being ruined is slight because of the large 
capital in their possession. 

3. In the problem solved above the stakes of both players were 
supposed to be equal, and we took them as units to measure the fortunes 
of both players. Next it would be interesting to investigate the case in 
which the stakes of A and B are unequal. An exact solution of this 
modified problem, since it depends on a difference equation of higher 
order, would be too complicated to be of practical use. It is therefore 
extremely interesting that, following an ingenious method developed by 
A. A. Markoff, one can establish simple inequalities for the required 
probabilities which give a good approximation if the fortunes of the 
players are large in comparison with their stakes. 

Problem 2. If the conditions presupposed in Prob. 1 are modified, 
in that the stakes of A and B measured in a convenient unit are a and 
and their respective fortunes are a and 6, find the probabilities for A or 
B to be ruined in the sense that at a certain stage the capital of A will 
become less than a or that of B less than /3. 

Solution. Let y» be the probability for A to be forced out of the 
game by the lack of sufficient money to set a full stake a when his 
fortune amounts to x and consequently that of his adversary is a -f- 6 — x. 
In the same way as before, we find that y, is a solution of the equation 
in finite differences: 

(4) y, = py,-N» + 

To determine y* completely, in addition to (4), we have two sets of 
supplementary conditions: 

(5) yo = yi = • • • = y„_i = 1 ^ 

(6) ya-\.b = ya+*-l = • • • = ya+*_(^_l) = 0. 
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Equation (6) expresses the fact that if the fortune of A becomes less 
than his stake, it is certain that A must quit. On the contrary, equation 

(6) indicates the impossibility for A to be ruined if the other player B 
does not have enough money to continue gaming. Equation (4) is an 
ordinary equation in finite differences of the order a p. It has par¬ 
ticular solutions of the form ^ where ^ is a root of the equation 

(7) ^ + g = 0. 

The left-hand member for 0 == 0 is positive and with increasing 6 de¬ 
creases and attains a minimum when 




a 


and then steadily increases and assumes positive values for large 
This minimum must be negative or zero because d = 1 is a root of (7). 
Now, if it is negative, there are two positive roots of (7). One of them 
is ^ = 1 and another > or < 1 according as 


or else 


P < 


a 


or 


P > 


a 

a + 0 


pfi — qa < 0 or >0. 


That is, the positive root of (7) different from 1 is > 1 when single games 
are favorable to B and < 1 if they are favorable to A, In case of equita¬ 
ble games, both positive roots coincide and ^ = 1 is a double root of (7). 
All the other roots of (7) are negative or imaginary. » 

The regular way to solve the problem would be to vnrite down the 
general solution of (4) involving a + jS arbitrary constants to be deter¬ 
mined by conditions (5) and (6). As this method would lead to a com¬ 
plicated expression for y*, we shall refrain from seeking the exact solution 
of our problem, and instead, following A. A. Markoff^s ingenious remark, 
we shall establish simple lower and upper limits for y, which are close 
enough if the fortunes of the players are large in comparison with their 
stakes. 

Lemma. If i9 a solution of equation (4) and none of the numbers 


yo, yi, . . . y«-i 

Va+bj Va^lt . • • ya+6-^+1 


is negativej then y* ^ 0 /or x = 0, 1, 2, . . . a + h. 

Proof. Let (A; = 0, 1, 2, ... a — 1) represent the probability 

that the player A whose actual fortune is x (and that of his adversary 
a + 6 — x) will be forced to quit when his fortune becomes exactly = k. 
Evidently is a solution of equation (4) satisfying the conditions 
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= 0 for X = 0, 1, ... — 1, A; + 1, ... a — 1; a + 6, 

a + 6 - 1, . . . a + 6 ~ /3 + 1; = 1. 

Similarly, if = 0, 1, 2, . . . /3 — 1) represents the probability that 
the player B will be forced to quit when the fortune of A becomes exactly 
= a 4- 6 — will be a solution of (4) satisfjring the conditions 

vjP — 0 for X = 0, 1, 2, ... a — 1; a + 6, ... a + 6 — Z 4- 1, 
a4-6-Z-l, ...a4-6-^4-l; t;<Vi = 1- 

Thus we get a 4- /3 particular solutions of (4), and it is almost evident 
that these solutions are independent. Moreover, since they represent 
probabilities, ^ 0, ^ 0 for a; = 0, 1, 2, . . . a + b. Now, any 

solution y, of (4) with given values of 


yo, yi, . . . y«-i 

ya+6, Va+b-h • • • Va^-fi+l 

can be represented thus 

a-l 0-1 

Vz = + 2/y‘*+*~<*'*^* 

fc-O 1-0 

Hence, y, ^ 0 for a; = 0, 1, 2, . . . a + hif none of the numbers 

yo, yi, . . . 2/«-i 

Va+bj ya+ 6 - 1 , . . . ya-^-0+1 

is* negative. This interesting property of the solutions of equation (4) 
derived almost intuitively from the consideration of probabilities can be 
established directly. (See Prob. 9, page 160.) 

The lemma just proved yields almost immediately the following 
proposition: If for any two solutions y' and y'' of equation (4) the 
inequality 

Vz ^ y'z 

holds for 

« = 0, 1, 2, . . . a - 1; a 4- 6, o 4- 6 - 1, . . . a 4- 6 - /3 4- 1, 

the same inequality will be true for all x = 0, 1, 2, ... a 4- b- It 
suffices to notice that y, = y'' — y' is a solution of the linear equation 
(4) and, by hypothesis, y* S 0 for x = 0, 1, 2, . . . a — 1; a 4- 
a4’6“l, . . . a4“Z>--jS4"l. 

Now we can come back to our problem. First, if the mathematical 
expectation of A 


pP - qa 
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is different from 0, equation (7) has two positive roots: 1 and B. With 
arbitrary constants C and D 

is a solution of (4). Whatever C and D may be, &8 & function of x 
varies monotonically. Therefore, if C and D are determined by the 
conditions 

Vo = 1, v:+*-fl+i = 0 

we shall have 

y' ^ 1 if a; = 0, 1, 2, ... a — 1 

Vx ^ 0 if a: = a + 6 — + h 


and by the above established lemma, taking into account conditions (5) 
and (6), we shall have for the required probability the following inequality 

y* ^ yi; 

or, substituting the explicit expression for 

0a+b—0+i 

y* ~ Qa+b-0-\-l _ 2 ’ 


If, on the contrary, C and D are determined by 
yi-i = 1, y'^ = 0 

we shall have 

Vh ^ 1 if a; = 0, 1, 2, ... a — 1 

y'^0 if a; = a + 6--/8H-l, ...a + 6 


and 


0a+h—a+l _ 0x—a+l 

y* ^ ^+6-a+l _ I 


Finally, taking a: = a, we obtain the following limits for the initial 
probability y*: 


0b-fi+l _ I 


^ ya ^ 


0^-1 
^+6-a+l _ 


1 ’ 


They give a sufficient approximation to ya if a and b are large com¬ 
pared with a and p. 

If each single game is equitable, equation (4) has a solution with two 
arbitrary constants: 

y' = C + Dx. 

Proceeding in the same way as before, we obtain the inequalities 

. b-fi + i _ b _ 

a + 6- ^J + l^*'*^a + 6- a + l 
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4 . To simplify the analysis, it was supposed that nothing limited the 
number of games played by A and B so that an 'eternal game, although 
extremely improbable, was theoretically possible. We now turn to 
problems in which the number of games is limited. 

Problem 3. Players A and B agree to play not more than n games. 
The probabilities of winning a single game are p and q, respectively, and 
the stakes are equal. Taking these stakes as monetary units, the fortune 
of A is measured by the whole number a and that of B is infinite or at 
least so large that he cannot be ruined in n games. What is the proba¬ 
bility for A to be ruined in the course of n games? 

Solution. Let yx.t represent the probability for A to be ruined when 
his fortune is measured by the number x and he cannot play more than 
t games. The reasoning we have used several times shows that y*.t 
satisfies a partial equation in finite differences: 

• (8) VxA = + qyx-i.t-i. 

Moreover, if A has no money left, his ruin is certain, which gives the 
condition 

(9) yo,t = 1 if t^O. 

On the other hand, if A still possesses money and cannot play any more, 
his ruin is impossible, so that 

(10) yx.Q = 0 if a: > 0. 

Conditions (9) and (10) together with equation (8) determine yx,t 
completely for all positive values of x and t. To find an explicit expres¬ 
sion for yx.t we shall use Lagrange’s method. Equation (8) has particular 
solutions of the form 


where a and /3 satisfy the relation 

ap = pa* -|- q. 

We can solve this equation either for or for a which leads to two different 
expressions of y,,*. Solving for /3 we have infinitely many particular 
solutions 


a*(pa A" qcc *)* 

with an arbitrary a and we can seek to obtain the required solution in the 
form 


y,.t = + qa-'^)‘f(a)da 
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where /(a) is supposed to be developable in Laurent’s series on a certain 
circle c. To satisfy (10) we must have 

i = 0 for * = 1, 2, 3, . . . 

which shows that /(a) is regular within the circle c. To determine /(a) 
completely, we must have, according to (9) 

^J[(pa + qa-^yf^dcc =1 for f = 0, 1, 2,- 

All these equations are equivalent to a single equation 


I f f{a)da _ 1 

““ 1 — € 

holding good for all sufficiently small €. The integrand has a single pole 
ao within c defined by 

ao ~ p€ai — = 0, 

and the corresponding residue is 


But this must be equal to 


q - paV 


1 - € 


or, substituting for € its expression in ao 

g + pa? 

paj — ao + g 

and hence for all sufficiently small ao 


that is, if 


/(ao) = 


_ g - pal 


pal — ao + g’ 


/(a) 


- g ~ pg^ 


pa‘ 


— 


a + 9 


all the requirements are satisfied. Taking into account that p + q 
we have 


/(«) 



1 . 


/(«) = 1 + 


2 [ 


i + i? 


and also 
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The expression for is therefore 

qor^y^^.Cna^da 

n-O 

where Co = 1 and Cn = 1 + {v/qY if n ^ 1. 

It remains to find the coefficient of 1/a in the development of the 
integrand in a series of descending powers of a. Since 

t 

a*“^(pa + qa^^y = 

i-o 

this coefficient is given by the sum 

t-x 

2 

f-0 

extended over all integers I from 0 up to the greatest integer not exceeding 

f ““ X 

^ ■ - Hence, the final expression for the probability ya.n is 

n —o 
2 

( 11 ) ».,« = + 5 "—*'] 
i-0 

with the agreement, in case of an even n — a, to replace the sum 

pO -f (f 

corresponding to I = ~2~~ natural that the right-hand 

member of the preceding expression should be replaced by 0 if n < a, 
which is in perfect agreement with the feet that A cannot be ruined in less 
than a games. 

The second form of solution is obtained if we express a as a function of 
P. The equation 

pa* — 0/3 + g = 0 
having two roots, we shall take for a the root 

_ P - Vg* - 4pg 

2p 

determined by the condition that it vanishes for infinitely large positive 
P and can be developed in power series of 1/P when \p\ > 2V^- Using 
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a in this perfectly determined sense, it is easy to verify that 

where c is a circle of radius > 1 described from 0 as its center, satisfies all 
the requirements. For it is a solution of equation (8). Next, for x = 0 
and t ^ 0, 


and, finally, for ^ = 0 and a; > 0 

_ 1 C{0 - - 4P9Y -0 

- i^iJA—^® 


because the development of the integrand into power series of 1/0 
starts at least with the second power of 1/0. 

To find Vs. tin explicit form, it remains to find the coeflGicient of 1/0 
in the development of 


/ iS - Vi8» - 4pg V y 
V 2p ; ^ -1 


++ 
^ 0^1 ^ 


+ l + + 


in a series of descending powers of 0, Let 

( ^ ~ ^ 1 4. . . . . 

V 2p / 0’^0^i^ » 

multiplying this series by 

f?T«- + «'-+'"+5 + X+--- 

we find that the coefficient of 1/0 in the product is 

lx + 1*^1 + • • • + ^, 

and hence 

yx,t = lx Ix+i + * • • + L 

provided t ^ x, for otherwise yx,t — 0. The quadratic equation in a 
can be written in the form 

« = + pa*) 

and the development of any power of its root vanishing for 0 — « into 
power series of 1/0 can be obtained by application of Lagrange^s series. 
We have 


2 x0 -^\+ pj*)**{*~^l 

nl L Ji-o* 
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but 


n!_ 


(x 4- 2i - 1)! 
i!(x + i)! 




if n = a: 4“ 2i, and = 0 if n = x 4 2t 4- 1. Hence, 


I = a; (x + 2i - 1)! < 


x+2t+l 


== 0 , 


and finally 

(12) ya.n = ?“|^1 + yp? + 


+ 


(i(g 4 A ; 4 1 ) 


(a 4 2A; — 1) 


1-2 


(P9) 




where k = — or k = ^^- - according as n and a are of the 

same parity or not. 

6. The difference 2/„,„ — ya.n-i gives the probability for the player A 
to be ruined at exactly the nth game and not before. Now, this differ¬ 
ence is 0 if n differs from a by an odd number, so that the probability of 
ruin at the (a 4 2t — l)st game is 0. That is almost evident because 
after every game the fortune of A is increased or diminished by 1 and 
therefore can be reduced to 0 only if the number of games played is of 
the same parity as a. If n = a 4 2i, the difference t/a,„ — Vo.n-i is 


o(a + i + 1) ■ ■ • (o + 2i - 1) , . 

1 • 2 • 3 • • • ( ^ 


Such, therefore, is the probability for A to be ruined at exactly the 
{a 4 2z)th game. The remarkable simplicity of this expression obtained 
by means which are not quite elementary leads to a suspicion that it 
might also be obtained in a simple way. And, indeed, there is a simple 
way to arrive at this expression and thus to have a third, elementary, 
solution of Prob. 3. 

Considering the possible results of a series of o 4 2t games, let A 
stand for a game won by A, and B for a game lost by A. The result of 
every series will thus be represented by a succession of letters A and 
We are interested in finding all the sequences which ruin A at exactly 
the last game. Because the fortune of A sinks from a to 0 there must be 
i letters A and f 4 a letters B in every sequence we consider. Besides, 
there i.s another important condition. Let us imagine that the sequence 
is divided into two arbitrary parts, one containing the first letter and 
another the last letter of the sequence. Let x be the number of letters B, 
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and y that of letters A in the second or right part of the sequence. There 
will be o 4“ f — X letters B and i — y letters A in the first or left jiart. 
It means that the fortune of A after a game corresponding to the last 
letter in the left part, becomes 

and since A cannot be ruined before the (a -f 2f)th game, x must always 
be >y. That is, counting letters A and B from the right end of the 
sequence, the number of letters B must surpass the number of letters A 
at every stage. Conversely, if this condition is satisfied the succession 
represents a series of games resulting in the ruin of A at the end of the 
series and not before. 

To find directly the number of sequences satisfying this requirement 
is not so easy, and it is much easier, following an ingenious method 
proposed by D. Andr4, to find the number of all the remaining sequences 
of i letters A and t 4- a letters B. These can be divided into two classes: 
those ending with A and those ending with B. Now, it is easy to show 
that there exists a one-to-one correspondence between successions of these 
two classes, so that both classes contain the same number of sequences. 
For, in a sequence of the second class (ending with B) starting from 
the right end, we necessarily find a shortest group of letters containiog 
A and B in equal numbers. This group must end with A, Writing 
letters of this group in reverse order without changing the preceding 
letters, we obtain a sequence of the first class ending with A, Con¬ 
versely, in a sequence of the first class there is a shortest group at the 
right end ending with B and containing an equal number of letters A and 
B. Writing letters of this group in reverse order, we obtain a sequence 
of the second class. 

An example will illustrate the described manner of establishing the 
one-to-one correspondence between sequences of the first and of the 
second class. Consider a sequence of the first kind 

B\BBABAA. 

The vertical bar separates the shortest group from the right containing 
letters A and B in equal numbers. Reversing the order of letters in this 
group, we obtain a sequence of the second class 

B\AABABB 

and this sequence, by application of the above rule, is transformed again 
into the original sequence of the first class. The number of sequences 
of the first class can now be easily found. It is the same as the number of 
ail possible sequences of t ~ 1 letters A and a + i letters B, that is, 

(a 4- 2i - 1)! _(a + i + l)(o 4 . < 4. 2) • • • (a 4- 2f - 1) 

(» - l)!(a 4-»)! “ 1-2 •••(»- 1) 
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The total number of sequences in both classes is 


^(a -f- f “h l)(a -f*» + 2) • • • (a + 2i — 1) 

^ 1-2 • (t-1) 

Hence, the number of sequences leading to ruin of A in exactly a + 2i 
games is 

(a + f + l)(a 4-1 + 2) • • • (a + 2t) 

1 • 2 • • t 

^(a + i + l)(a + t + 2) • • • (a + 2t — 1) _ 

1 * 2 • • • (i - 1) 

_+ i + 1) * • • (a + 2t — 1) 

1 • 2 • • • f 


As the probability of gains and losses indicated by every such sequence 
is the same, namely, the probability of the ruin of A in exactly 

a + 2i games is 

a(a 4 -1 + 1) • • • (o + 2t - ( 

1 • 2 • 3 • • • i ^ ^ 


and hence the second expression found for ya,n follows immediately. 

The problem concerning the probability of ruin in the course of a 
prescribed number of games for a player playing against an infinitely 
rich adversary was first considered by de Moivre, who gave both the 
preceding solutions without proof; it was later solved completely by 
Lagrange and Laplace. The elementary treatment can be found in 
Bertrand^s “Calcul des probabilit^s.'^ 

6. Formulas (11) and (12), though elegant and useful when n is not 
large, become impracticable when n is somewhat large, and that is pre¬ 
cisely the most interesting case. Since the question of the risk of ruin 
incurred in playing equitable games possesses special interest, it would not 
be out of place at least to indicate here, though without proof, a con¬ 
venient approximate expression for the probability ya,n in case of a large 
n and p = = K- Lei f be defined by 


-\/2(n + I) 

then for n ^ 50 it is possible to establish the approximate formula 


Va.n 



e—d* + / 
6n 


where — 1 < ^ < 1. Suppose, for instance, that the fortupe of a player 
amounts to SlOO, each stake being $1, and he decides to play 1,000, 
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5,000, 10,000, 100,000, 1,000,000 games. Corresponding to these cases, 
we find 

t = 2.2364, 0.9999, 0.7071, 0.2236, 0.0707 

and hence 

r = 0.9984, 0.8427, 0.6827, 0.2482, 0.0796. 

V*-Jo ♦ 

The corresponding approximate values of yioo.n are 

0.0016, 0.1573, 0.3173, 0.7518, 0.9204. 

Thus, for a player possessing $100 there is very little risk of being ruined 
in the course of 1,000 games even if he stakes $1 at each game. The risk 
is considerably larger, but still fairly small, when 5,000 games are played. 
In 10,000 games we can bet 2 to 1 that the player will still be able to 
continue. But when the limit set for the number of games becomes 
100,000, we can bet 3 to 1 that the player will be ruined somewhere in the 
course of those 100,000 games. Finally, there is little chance to escape 
ruin in a series of 1,000,000 games. The risk of ruin naturally increases 
with the number of games, but not so fast as might appear at first sight. 

7. We conclude this chapter by solving the following problem, 
where the fortunes of both players are finite. 

Problem 4. Players A and B agree to play not more than n games, 
the probabilities of winning a single game being p and g, respectively. 
Assuming that the fortunes of A and B amount to a and h single stakes 
which are equal for both, find the probability for A to be ruined in the 
course of n games. 

Solution. Let Zx,t be the probability for the player A to be ruined 
when his fortune is x (and that of his adversary a -f- 6 — x) and he can 
play only i games. Evidently Zx,t satisfies the equation 

(13) Zx.i = 

perfectly similar to equation (8), but the complementary conditions 
serving to determine «*,< completely are different. First we have 


(14) 

Next, 

*o.< = 1 

for 

t ^ 0. 

(15) 

*.+».( = 0 

for 

t ^ 0, 


because if A gets all the money from B, the games stop and A cannot be 
ruined. Finally, 

fl6) 2jp,o ~ 0 for x ~ 1, 2, 3, * . .a "f" 6 *“ 1. 
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because A, liaving money left at the end of play, naturally cannot be 
ruined. 

Since (13) has two series of particular solutions 

and 

where a and a are roots of the equation 

pa^ — -|- g = 0 

both developable into series of descending powers of /3 for |j8| > 1, we 
shall seek in the form 

2*,. = ~ J[[/(/3)a* + 

Here the integration is made along a circle of sufficiently large radius and 
J{Q) and are two unknowm functions which can be developed into 
series of descending powers of Obviously Zx,t satisfies (13) identically 
in X and i. For x — Q and f ^ 0 we have the condition 


^.J[/(8) + = 1; < = 0, 1, 2, . . . 

which is satisfied if 

(17) m + m = j~ 

Condition (15) will be satisfied if 

(1«) a“+y(d) + a'»^V(l3) = 0 

and it remains to show that at the same time (16) is satisfied. Solving 
(17) and (18), we have 

^'a+6 I 

fW - -f'a^b _ ^04^6 P — I 
— 1 

= £,'<•+* _ a“+‘ ’ 

and 


(19) + 


(F~-"l)(a'‘*+^ - a“+") "" 

~ \P/ ()5 ~ !)(«'“+* - a"^*) 


Now let a be the root vanishing for = » and a the other root wliose 
development in series of descending powers of 0 starts with the term 
containing j3. Evidently the development of (19) for 

X = 1, 2, 3, ... a -f 6 - 1 
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does not contain terms involving the first power of 1/fi, and hence 
«*,o = 0 if a? = 1, 2, 3, , . . a 4* h — 1 as it should be. The solution 
of (13) satisfying (14), (15), (16) being unique, its analytical expression is 
therefore 



whence for x = a and < = n 


_ \P/ r 

" ~MJc «'•+* - a^*/S - l' 

To find an explicit expression for Za,n it remains to find the coefficient of 
l/jS in the development of 

\p/ — 1 

in series of descending powers of fi. This can be done in two different 
ways. First we can substitute for a' its expression in a: 


<x = ~Of 
P 


and present P in the form 

P = 


1-(?Y ^ 


or developing into series 

But the coefficient of l/j3 in 


«»+*» 


= [«• - + (?y - ( ?) + 


1 g" 

Ju- 1 ’ 




by the second solution of Prob. 3 is the probability pm,n for a player with 
a fortune m to be ruined by an infinitely rich player in the course of n 
games. Hence, the final expression for Za,n is 


^,n “ I'a,* 


~ ) 1/a+ab.n + 


m+b 


t/ia^tb.n ~ 


0+26 


Vla-l-46.fl + 
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the terms of this series being alternately of the form 
\9/ y(**+»)«+***" 

and 




y(2it+l)a+(2Jk+*)6.» 


for fc = 0, 1, 2, . . . . The series stops by itself as soon as the first 
subscript of y,.n becomes greater than n. 

To obtain a second expression of Za,n we notice that 


- /*» 




x'«+k - ^+6 




^'•+6 — a* — a * a! — a 

is a rational function of /3 whose denominator 

H = - 1 - 

a — a 

is a polynomial in P of the degree a -f 5 — 1. To find the roots of R = 0, 
we set fi — 2VP7 cos Since, then, 


we have 

Tbe equation 
having roots 


■ -^" -“^ sin (g -f b)ip 


R = 1^^ 2 


= 


hw 


\p/ sin ip 

sin (g -f b)ip ^ ^ 
sin (p 

A = 1, 2, ... a 4- 6 - 1, 


0 + 6^ 

the 0 + 6 — 1 roots of R are 

Pk = 2 Vpig cos iph. 

Now we can resolve the rational function P into a sum of simple elements 
as follows: 

0 + 6-1 

'■-*«')+f4h+ 2f4v. 

6-1 


_ rip'’ - 9*) 
"0 — 


1+6 _ ^+6 


where 
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and for A > 0 


Ah = -(2V^)-+'( 



_ sin <ph _ 

+ 5)(1 - 2-\/p9 cos v>») 


sin a(ph(co8 <phy 


while E(0) is the integral part of P. The coefficient of 1//3 in the develop¬ 
ment of P being 


0-^6 — 1 

•Ao + Ah, 

A-l 


we have a new explicit expression for Za.n- 


( 20 ) 


9*(p* - g*) 

p- 


na+b _ 


(2\/^)’‘-*-»(gP~0^ _ 


Sin 


Tch 

Q h 


a + h 


H-i 1 - 2\/^ cos 


rh ' 
a + h 


Tcah 
d h 




This expression shows clearly that 2o.n, with increasing n, approaches 
the limit 

pa+fc _ qa-\-b 

representing the probability of ruin when the number of games is unlim¬ 
ited, in complete accord with the solution of Prob. 1. 

The first term in (20) naturally must be replaced by in case 

p = g = 1^. This form of solution was given first by Lagrange. 


Problems for Solution 

1. Players A and B with fortunes of $50 and $100, respectively, agree to play until 

one of them is ruined. The probabilities of winning a single game are % and 
respectively, for A and B, and they stake $1 at each game. What is the probability 
of ruin for the player .47 Ans. Very nearly 2”*® = S.SS IO"^®. 

2. If A and B at each single game stake $3 and $2, respectively, and have fortunes 
of $30 and $20 at the beginning, what is the approximate value of the probability 
that A will be ruined if the probability of his winning a single game is (o) p = 

{h) p^H? 

Ans. (a) 0.40 + A; |A| < 1.7 X 10”*; (b) 0.96 + A; |Al < 4.6 X lO'*. 

3. A player A with the fortune $a plays an unlimited number of games against an 
infinitely rich adversary with the probability p of winning a single game. He stakes 
$1 at eacin game, while his rich adversary risks staking such a sum ^ as to make the 
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game favorable to A. What is the probability that A will be ruined in the course 
of the games? Give numerical results if (a) o *= 10, p = /S = 3; (6) a = 100, 
p =* “ 3. Ans. Let B < 1 be a positive root of the equation p0^+‘ — -f ^ 0. 

The required probability P is: P = 

In case (o) P = 0.002257; in case (6) P = 3.43 • 10-*^ 

4 . A player A whose fortune is $10 agrees to play not more than 20 games against 
an infinitely rich adversary, both staking $1 with an equal probability of winning a 
single game. What is the probability that A will not be ruined in the course of 
20 games? Ans. 0.0734. 

6. Players A and B with $1 and $2, respectively, agree to play not more than n 
equitable games, staking $1 at each game. What are the probabilities of their ruin? 


Ans. For A:- 
3 


3 + (- 1 )* . „ 1 3 - (-!)• 


= 1 if n is odd. 


^2 if n is even. 


3.2»+i 

6. Players A and B with $2 and $3, respectively, play a series of equitable games, 
both staking $1 at each game. What are the probabilities of their ruin in n games? 
Give the numerical result if n » 20. Ans. 

7. Find the expression of the probability of the ruin of A when his adversary 
B is infinitely rich, corresponding to formula (20). Ans. From the definition of a 
definite integral it follows that 


For 

• 5 


For 5:- — 
5 




=» 1 if n is even, i;» 2 if n is odd. 


Va.n Pa.w ~ 


\P/ I Sin ip sin atp 

r Jo 1 - 




-(cos 


cos ip 


where 


!/«.• =1 if P ^ Q 

ifi 


».,« = I r 1 >f p > ?• 

If the games are equitable and n differs from a by an even number, then 


, 2 P2 em aip 

Va.n = 1-I -(cos ip) ^^dip. 

tJ 0 sin ip 


This formula was given by Laplace. 

8. Referring to the last formula iu the preceding problem, show that 


Va.n “ 1 


2 f* « 

- y=\ 


'du + A 


where 


t - 


V2(» + I)’ 




^ 32. 
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Indication of the Proof. It is important to prove the following inequalities first 


ip (cos 


-!L±J^a 


Sin ip 

^(cos ip )*'^^ 


< e 


for 


0<^s- 


n + 1 . (n+l)<>« 


Sin ip 


> e 


for 


0 <ip^ 


whence 


ip (cos 
sin ip 


1^1 - 9 ” ^ V *j; 0 < 9 < 1 


provided 0 < ^ ^ tt/A. The rest of the proof is easy. 

9. Attempt a direct proof of the important lemma (page 144) used in the discus- 
sion of Prob. 2. 

Hint: The proof can be based upon the following proposition^ generalizing an 
important theorem on determinants due to Minkowski: Let 


fi = auxi 4- a 2 iX 2 + * • • + On<a:«; i = 1, 2, 3, . . . n 


be a system of linear forms whose coefficients satisfy the following conditions: 

(1) an > 0; aki ^ 0 if k 7 ^ i] au + an + • • • + a*< ^ 0. 

(2) One of these sums is positive. 

If these forms assume nonnegative values, then every x» ^ 0(i = 1, 2, . . . n). 
Proof by induction: Express Xn through aji, ajj, . . . thus: 


Xn = 


fn — fllnXl — OtnXt — 
Onn 


■“ On—l,na:n—1 


and substitute into the remaining forms. Show that the resulting forms in xit Xtt 
. . . Xn-i satisfy the same conditions (1) and (2). Hence, it remains to prove the 
proposition for two forms, which can easily be done. 
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CHAPTER IX 


MATHEMATICAL EXPECTATION 

1, Bernoulli's theorem, important though it is, is but the first link 
in a chain of theorems of the same character, all contained in an extremely 
general proposition with which we shall deal in the next chapter. But 
before proceeding to this task, it is necessary to extend the definition of 
“mathematical expectation^^—an important concept originating in 
connection with games of chance. 

If, according to the conditions of the game, the player can win a 
sum a with probability p, and lose a sum b with probability g = 1 — p, 
the mathematical expectation of his gain is by definition 

pa — qb. 

Considering the loss as a negative gain, we may say that the gain of the 
player may have only two values, a and —6, with the corresponding 
probabilities p and q, so that the expectation of his gain is the sum of the 
products of two possible values of the gain by their probabilities. In this 
case, the gain appears as a variable quantity possessing two values. 

Variable quantities with a definite range of values each one of which, 
d^ending on chance, can be attained with a definite probability, are 
called “chance variables,or, using a Greek term, “stochastic^’ variables. 
They play an important part in the theory of probability. A stochastic 
variable is defined (a) if the set of its possible values is given, and (6) if 
the probability to attain each particular value is also given. 

It is easy to give examples of stochastic variables. The gain in a 
game of chance is a stochastic variable with two values. The number of 
points on a die that is tossed, is a stochastic variable with six values, 
1, 2, ... 6, each of which has the same probability J^. A number on 
a ticket drawn from an urn containing 20 tickets numbered from 1 to 20, 
is a stochastic variable with 20 values, and the probability to attain 
any one of them is Each of two urns contains 2 white and 2 black 
balls. Simultaneously, one ball is transferred from the first urn into the 
second, while one ball from the latter is transferred into the first. After 
this exchange, the number of white balls in one of the urns may be regarded 
as a stochastic variable with three values, 1, 2, 3, whose corresponding 
probabilities are, respectively, H is natural to extend the 

concept of mathematical expectation to stochastic variables in gelferal, 

m 
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Suppose that a stochastic variable x possesses n values: 


®1> 

and 

Vh Pi, • • • Pn 

denote the respective probabilities for x to assume values Xi, , x«». 

By definition the mathematical expectation of x is 

E{x) = PiXi + PtX% + • * • + VnXn- 


It is understood in this definition that the possible values of the 
variable x are numerically different. For instance, if the variable is a 
number of points on a die, its numerically different values are 1, 2, 3, 4, 5, 
6, each having the same probability, By definition, the mathematical 
expectation of the number of points on a die is 

J(1 + 2 + 3 + 4 + 5 + 6)= 3.5. 

If the variable is the number on a ticket drawn from an um containing 
20 tickets numbered from 1 to 20, its numerically different values are 
represented by numbers from 1 to 20, and the probability of each of 
these values is ^o, so that the mathematical expectation of the number 
on a ticket is 

*(1 + 2 + • • • + 20) « 10.5. 

2. It is obvious that the computation of mathematical expectation 
requires only the knowledge of the numerically different values of the 
variables with their respective probabilities. But in some cases this 
computation is greatly simplified by extending the definition of mathe¬ 
matical expectation. Suppose that, corresponding to mutually exclusive 
and exhaustive cases A\f A^y . . . Am, the variable x assumes the values 
Xiy X 2 , . . . Xmy with the corresponding probabilities pi, p 2 , . . . Pm; 
we can define the mathematical expectation of x by 

E(x) = piXi + P2X2 + ’ • • + PmXm. 

/what distinguishes this extended definition from the original one is that 
in the second definition the values Xi, X 2 , . . . Xm need not be numerically 
different; the only condition is that they are determined by mutually 
exclusive and exhaustive cases. 

To make this distinction clear, suppose that the variable x is the 
number of points on two dice. Numerically different values of this 
variable are 


2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 
and their respective probabilities 

1 ^, A, A, A> A» A> A* 
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Therefore, by original definition, the expectation 

A + A + i* + *« + »* + 18 + + U + U + H + il = V.‘ 


But we can distinguish 36 exhaustive and mutually exclusive cases accord¬ 
ing to the number of points on each die and, correspondingly, 36 values 
of the variable x, as shown in the following table: . 


First die 

Second die 

X 

First die 

Second die 

X 

1 

1 

2 

4 

1 

5 

1 

2 

3 

4 

2 

6 

1 

3 

4 

4 

3 

7 

1 

4 

5 

4 

4 

8 

1 

5 

6 

4 

5 

9 

1 

6 

7- 

4 

6 

10 

2 

1 

3 

5 

1 

6 

2 

2 

4 . 

5 

2 

7 

2 

3 

5 

5 

3 

8 

2 

4 

6 

5 

4 

9 

2 

5 

7 

5 

5 

10 

2 

6 

8 

5 

6 

11 

3 

1 

4 

6 

1 

7 

3 

2 

5 

6 

2 

8 

3 

3 

6 

6 

3 

9 

3 

4 

7 

6 

4 

10 

3 

5 

8 

6 

5 

11 

• 3 

6 

9 

6 

6 

12 


The probability of each of these 36 cases being by the extended 
definition the mathematical expectation of x is 


2 + 2-3 + 4-34-5-4-f65 + 7-6 + 8'5 + 9-4 + 10*3 + ll-2 + 12 


= 7 


as it should be. 

It is important to show that both definitions always give the same 
value for the mathematical expectation. 

Let Xi, X 2 , . . . x« be the values of the variable x corresponding 
to mutually exclusive and exhaustive cases yti, A 2 , . . . /Im, and. 
Pi, P 2 , . . . Pm, their respective probabilities. By the extended defini¬ 
tion of mathematical expectation, we have 


(1) 


E(x) = PiXi -h PaX 2 4- • • • + PmaJm. 
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The values Xi, X 2 , . . . Xm are not necessarily numerically different, 
the numerically different values being 

We can suppose that the notation is chosen in such a way that 

Xi, X 2 , . . . Xa are equal to f; 

Xo+i, Xa+ 2 , . . . Xb are equal to rj; 

Xb+i, X 6 + 2 , . . . are equal to f; 


xi^i, X/+ 2 , . . . x« are equal to X. 

Hence, the right-hand member of (1) can be represented thus: 

(Pl + P2 + • * * 4"Pa)f + {Pa4l + Pa-\-2 + ' ‘ + Pb)^ + * * * + 

+ (Pi-l-1 + Pl+i + * * * + Pm)X. 

But by the theorem of total probabilities, the sum 
Pi + P2 4- * * * + Pa 

represents the probability P for the variable x to assume a determined 
value because this can happen in a mutually exclusive ways; namely, 
when X = Xi, or X = X 2 , . . . or x = Xa. By a similar argument we see 
that the sums 

Pa+l + Pa+2 4- • • * -f P6 

P6+1 4- P6-f2 4- • * ' 4- Pc 


Pl+l + Pi+2 4- • • • + pm 

f- 

represent the probabilities < 3 , P, ... T for the variable x to assume 
values ij, f, . . . X. Therefore, the right-hand member of (1) reduces 
to the sum 

Pi + Qv + R^-h - -hTX 

which, by the original definition, is the mathematical expectation of x. 

If, corresponding to mutually exclusive and exhaustive cases, a 
variable x assumes the same value a —in other words, remains constant— 
it is almost evident that its mathematical expectation is a, because the 
sum of the probabilities of mutually exclusive and exhaustive cases is 1 . 
It is also evident that the expectation of ox where o is a constant, is 
equal to a times the expectation of x. 

Note: Very often the mathematical expectation of a stochastic variable is called 
its **mean value.’^ 

Mathematical Expectation of a Sum 
3. In many cases the computation of mathematical expectation is 
greatl^^facilitated by means of the following very general theorem: 
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Theorem. The mathematical expectation of the mm of several variables 
is equal to the mm of their expectations; or, in symbols, 

E{x + y-\-z+---+w)= E(x) -h E(y) + Eiz) + * • • + E{w), 

Proof. We shall prove this theorem first in the case of a sum of two 
variables. Let x assume numerically different values Xi, X 2 , . . . x„, 
while numerically different values of y are yi, 2 / 2 , • Vn. In regard to 

the sum x + y we can distinguish mn mutually exclusive cases; namely, 
when X assumes a definite value x* and y another definite value y/, while i 
and j range respectively over numbers 1, 2, 3, . . . m and 1, 2, 3, . . . n. 
If Pa denotes the probability of coexistence of the equalities 

X = Xi, y = yi 

we have by the extended definition of mathematical expectation 

m n 

E{x 4 - 2/) = 5 ) 

or 

m n m n 

(2) E(x + y) = XX PijXi + XX PaVy 

t-u-i 

As the variable x assumes a definite value Xi in n mutually exclusive 
ways (namely, when the value Xi of x is accompanied by the values 
Vh 2 / 2 , •• • Vn of y) it is obvious that the sum 

y-i 

represents the probability p* of the equality x = x*. In a similar manner 
we see that the sum 

m 

t-l 

represents the probability g, of the equality y = y^ Therefore 

tn n fra fi fra 

% %PiiXi = 2 ^*. ^Pa = 

»-iy-i *-i y-i »-i 




i-ii-i y-i 


and similarly 



166 INTRODUCTION TO MATHEMATICAL PROBABILITY IOhap. IX 


that is, by (2) 

+ y) = E(x) + E(y) 

which proves the theorem for the sum of two variables. 

If we deal with the sum of three variables x + y + 2 , we may consider 
it at first as the sum of x + y and z and, applying the foregoing result, 
we get 

E(x -h y -f- 2 ) * E(x + y) + E{z); 

and again, by substituting E{x) + E(y) for E(x + y), 

E{x + y + 2 ) = E(x) -f E(y) + E(z). 

In a similar way we may proceed farther and prove the theorem for the 
sum of any number of variables. 

4. The theorem concerning mathematical expectation of sums, 
simple though it is, is of fundamental importance on account of its very 
general nature and will be used frequently. At present, we shall use it 
in the solution of a few selected problems. 

Problem 1. What is the mathematical expectation of the sum of 
points on n dice? 

Solution. Denoting by Xi the number of points on the ith die, the 
sum of the points on n dice will be 

8 ^ Xi X% + • * • + Xn, 

and by the preceding theorem 

E{s) = E(xi) + E(x 2 ) -f ' * * + E{xn)- 
But for every single die 

E{Xi) = t = 1, 2, . . . 

therefore 



Problem 2. What is the mathematical expectation of the number of 
successes in n trials with constant probability p? 

Solution. Suppose that we attach to every trial a variable which 
has the value 1 in case of a success and the value 0 in case of failure. If 
the variables attached to trials 1, 2, 3, ... n are denoted by Xi, X 2 , . . . 
Xn, their sum 

m = xi 4- X 2 4- * • • + x„ 

obviously gives the number of successes in n trials. Therefore, the 
required expectation is 

' E{m) = E{xx) 4- E{x 2 ) 4* • • • 4- E{xn). 
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But for every i » 1, 2, 3, . . . n 


E{x^) = p ‘ 1 + (1 - p) • 0 = p, 


because Xi may have values 1 and 0 with the probabilities p and 1 — p 
which are the same as the probabilities of a success or a failure in the tth 
trial. Hence, 

E{m) = np 


or 


E(m — np) = 0, 


which may also be written in the form 


2 ) Tm{m — np) = 0. 

m — 0 

This result was obtained on page 116 in a totally different and more 
complicated way. The new deduction is preferable in that it is more 
elementary and can easily be extended to more complicated cases, as 
we shall see in the next problem. 

Problem 3. Suppose that we have a series of n trials independent or 
not, the probability of an event being p< in the ith trial when nothing is 
known about the results of other trials. What is the mathematical 
expectation of the number of successes m in n trials? 

Solution. Again let us introduce the variable Xi connected with 
the tth trial in such a way that x* = 1 when the trial results in a success 
and Xi = 0 when it results in failure. Obviously, 


and 

But 

and therefore 


tW = Xi 4“ X2 + • • • -f- Xn 

E{in) = E{xi) 4 - E{x2) 4 - * * • 4 * E(xn)^ 
E(Xi) = 1 • p< 4 - 0 • (1 - p<) = Pi 


E(m) = Pi 4- P2 4- * * ' 4" Pn. 

For instance, if we have 5 urns containing 1 white, 9 black; 2 white, 
8 black; 3 white, 7 black; 4 white, 6 black; 5 white, 5 black balls, and we 
draw one ball out of every urn, the mathematical expectation of the 
number of white balls taken will be: 


E(in) — iV "h tV “h iV ■+■ tV + — 1-5- 

Problem 4. An urn contains a white and 6 black balls, and c balls are 
drawn. What is the mathematical expectation of the number of the 
white balls drawn? 
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Solution. To every ball taken we attach a variable which has the 
value 1 if the extracted ball is white, and the value 0 otherwise. The 
number of white balls drawn will then be 

8 = Zi -h ^2 -h * • • + 

But the probability that the tth ball removed will be white when nothing 

is known of the other balls is —therefore 

a + 0 


B(x,) 


a -f 6 


1 + 


a + 6 


a 

a + h 


for every i, and the required expectation is 


E(s) = 


ca 

o "4" 6 


Problem 6. An urn contains n tickets numbered from 1 to n, and 
m tickets are drawn at a time. What is the mathematical expectation 
of the sum of numbers on the tickets drawn? 

Solution. Suppose that m tickets drawn from the urn are disposed 
in a certain order, and a variable is attached to every ticket expressing 
its number. Denoting the variable attached to the fth ticket by a;*, 
the sum of the numbers on all m tickets apparently is 

5 = iCi + aJ2 "h * ’ * + Xnf 


But when taken singly, the variable Xi may represent any of the numbers 
1, 2, 3, . . . n, the probability of its being equal to any one of these 
numbers being 1/n. By the definition of mathematical expectation, we 
have 


EiXi) = 


l~l~2-f-3 + • * ’ -f-w 
n 


n -f 1 
2 ' 


and therefore 


E(8) = 


Tn{n + 1) 
2 


For example, taking the French lottery where w = 90 and m = 5, we 
find for the mathematical expectation of the sum of numbers on all 5 
tickets 

E(s) = = 227.5. 


Problem 6. An urn contains n tickets numbered from 1 to n. These 
tickets are drawn one by one, so that a certain number appears in the 
first plflCe, another number in the second place, and so on. We shall say 
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that there is a “coincidence'* when the number on a ticket corresponds 
to the place it occupies. For instance, there is a coincidence when the 
first ticket has number 1 or the second ticket has number 2, etc. Find 
the mathematical expectation of the number of coincidences. Also, find 
the probability that there will be none, or one, or two, etc., coincidences. 

Solution. Let Xi denote a variable which has the value 1 if there is 
coincidence in the ith place, otherwise Xi = 0. The sum 

fi = aji -f xj + * • • + x„ 

gives the total number of coincidences and 

Eifi) = E{x\) -f- E{x^ + • * * + E(xn). 

But 

E(x,) =1-1=1 
n n 

because the probability of drawing a ticket with the number i in the ith 
place without any regard to other tickets obviously is 1/n; therefore, 

E{a) = n • i = 1. 

On the other hand, denoting the probability of exactly i coincidences by 
pj, we have by definition 

E{s) = p, 4- 2p2 4- • • • + np„, 

and, comparing with the preceding result, we obtain 

(5) pi 4- 2p2 4“ • * ' -f npn = 1. 

Let us denote by <p{n) the probability that in drawing n tickets, we shall 
have no coincidences. It is easy to express pi by means of ^(n — t). 
In fact, we have exactly i coincidences in 

ri - n{n - ' {n - i 1) 

" 1 • 2 • 3 • • i 

mutually exclusive cases; namely, when the tickets of one of the 

c; 

specified groups of i tickets have numbers corresponding to their places 
while the remaining n — i tickets do not present coincidences at all. 
By the theorem of compound probability, the probability of i coincidences 
in i specified places is 


1 . 1 . . . _ 

n — t 4- 1 


n n — 1 
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and the probability of the absence of coincidences in the remaining n — i 
places is ^(n — i). The probability of exactly i coincidences in i specified 
places is therefore 

- i) _ 

n(n — 1) • * • (n — t + 1)^ 


and the total probability of exactly i coincidences without specification 
of places is 


Vi 


n(n — 1) • • • (n — i -f- 1) ._ — i) _ 

1 • 2 • 3 • • * i n(n — 1) • • • (n - i -f 1)* 


or 

(4) 


_ ip{n - 0 

P‘ 1 • 2 • 3 • • • *■■ 


The symbol v>(0) has no meaning, but the preceding formula holds 
good even for i = n if we assume ^(0) = 1. 

Substituting expression (4) for into (3), we reach the relation 


ip{n — 1 ) + 


^(n — 2) 
1! 


V>(n - 3 ) , 

+ 21 ^ 


• • + 


»>( 0 ) 

(n - 1)! 


1 ; 


or changing n into n + 1 


. »>(» -1) I»’(«- 

<p{n) + 


2 ) 


+ 


+ ^ = 1 


which gives successively ^(1), v>(2), ^(3), ... by taking 
n = 1, 2, 3, . . . . 

The general result, which can easily be verified, is 

.(») - 2^*- 

or, in an explicit form, 

. 1 . 1 1 . . (- 1 )" 

<p{n) 1 1 + 1.2 1.2 • 3 1 • 2 • 3 • • • n' 

Even for moderate n this is very near to 

i “ 1 “ I + iT2 ~ + • • • od tV- = 0.36787944. 


Mathematical Expectation of a Product 
6. For the product of two or more stochastic variables we do not 
posset** afiything so general as the foregoing theorem concerning the 
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mathematical expectation of sums. An analogous theorem with respect 
to the product of stochastic variables can be established only under 
certain restrictive conditions. 

Several stochastic variables are called ** independent'' if the proba¬ 
bility for any one of them to assume a determined value does not depend 
on the values assumed by the remaining variables. For instance, if the 
variables are the numbers of points on dice, they may be considered as 
independent. 

On the other hand, we have a case of dependent variables in numbers 
on tickets drawn in a lottery. For, in this case the fact that certain 
tickets have determined numbers precludes the possibility of any one of 
these numbers appearing on other tickets drawn at the same time. 

If more than two variables are independent according to the above 
definition, it is clear that any two of them are independent. But the 
converse is not true: It is easy to imagine cases when any two of the 
variables are independent and yet they are not independent when taken 
in their totality. Therefore, when speaking of independence of variables, 
we must always specify whether they are independent in their totality 
or only in pairs. 

For two independent variables we have the following simple theorem: 

Theorem. The mathematical expectation of the product xy of two 
independent variables x and y is equal to the product of their expectations; 
or^ in symbols 

E{xy) - Eix)E{y). 

^ Proof. Let Xij aj 2 , . . . Xm be the complete set of values for x, and 
Vh 1/2, .. . 2 /n the analogous set for y. Denoting the probability of 
X being equal to x* by p*, and similarly, the probability of y being equal 
to yj by Qj, the events 

X = Xi and y = y, 

are independent by definition of independence—because the probability 
of X being equal to x* is not affected by the fact that y has assumed any 
one of its possible values, and it remains p*. 

By the theorem of compound probability the simultaneous occurrence 
of the events 


X = Xi and y = y,* 

has the probability p<gy. Again, by the extended definition of mathe¬ 
matical expectation 

m n 

B{xy) = 5 ^ ^Vifl0<yi • * . 

I-U-I 
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because the values of the product xy are determined by mn exhaustive 
and mutually exclusive cases 

X 2/ == Vi 

t = 1, 2, . . . m; i = 1, 2, . . . n. 

Now, performing the summation with respect to j first, while i remains 
constant, we have 

n n 

%P<9iXiUi = PiXt’^q.Vi = PiXiEiy), 
y-i j-i 

and again 

m m 

E(.xy) = ’^piXiEiy) = E(.y)'^p,Xi, 
or 

E(xy) = E{x)E{y). 

This theorem can be extended to the case of several factors inde- 
pendent in their totality. For instance, if a;, y, z are independent, it is 
obvious that xy and z are also independent. Hence 

E{xyz) = E[xy)E(z), 

and again 

E{xyz) = E{x)E{y)E{z). 

In a similar way we can extend this theorem to any number of inde¬ 
pendent factors. 

As an important application, let us consider two independent variables 
X and y with the respective expectations a and h. The variables a; — o 
and y — 5 being independent also, we have 

E{x - a)(y - fc) = E{x - a)E{y — h)] 
but 

E{x — a) = E{x) —a = a — a«0; 

therefore 

(6) E(,x - a)(y - 6) = 0. 

Dispersion and Standard Deviation 
6. Let a; be a variable and a its mathematical expectation. The 
expectation of 

(x - ay 

is called ** dispersion” of the variable, and the square root of dispersion 
ia usually called ‘‘standard deviation.” As 

‘ (x — ay *= X* — 2ax -f- a* 
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we can apply the theorem on the expectation of sums to the right-hand 
member of this identity and find 

E{x - a)2 = E{x^) - 2aE(x) 4- o* = E{x^) - a* 

or, denoting by h the expectation of x^, 

(6) E{x — a)* = 6 — a*. 

Thus, the computation of dispersion can be reduced to the computa¬ 
tion of the expectation of the variable itself and its square. Also, denot¬ 
ing by or the standard deviation of x, we have the formula 

a* = 6 - a*. 

For instance, if the variable is the number of points on a die, we have 


7 , 1 * 4 - 2 * 4 - • • • + 6 * 91 

“ = 2 ’ ^ - 6 - T 


and 


= 2.917; cr = 1.708. 

Dispersion of Sums 

7. It is important to have a convenient formula to find the dispersion 
of a sum 

8 ^ Xi + Xt Xn 

of several stochastic variables. The expectation of s is given by 

E{8) = E(xi) 4- E(x 2 ) 4" • • • 4" E(xn) 

or 

E{s) = Oi 4” Os 4“ • • • 4" Oni 

denoting by a,- the expectation of The deviation of s from its expectar 
tion is, therefore, 

X\-\- X2 * ' • 4“ iCn — (Oi 4" 4“ • • • 4- On), 

and we have to find the expectation of 

(Xi 4” a?! 4“ ’ • * 4“ iCn Ol ■“ 

Now we have identically 

n 

(®J + Xi + • • • + *« — Oi — Cl — • • • — O.)* = — Oj)* + 


+ 25)(*i - a,), 
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the last sum being extended over all the different combinations of sub¬ 
scripts i and j for which i 9 ^ j and consisting of n(n — l)/2 terms. 
The mathematical expectation of a sum being equal to the sum of the 
expectations of its terms, we must find the expectations of the terms 

(Xi - a,)* and (x< - a<)(a;/ - a,). 

The first is the dispersion of Xi and can be found from (6); namely, 

E{xi — a<)* = hi — oj = a} 

if hi is the expectation of xf. 

As to 

E{xi — (i^(xj 

instead of it we introduce the so-called ‘‘correlation coeflBicient” of 
and Xi 

o Eixi - ai)(xi - Ui) 

Iti,i -- 

cr,try 

Denoting the required dispersion by D, we obtain 

(7) D = a* + <r| + • • • + <rj + + 2Ri,tfri<rz + • • • + 

so that the dispersion of a sum can be obtained as soon as we know the 
dispersion of its terms and their correlation coefficients. 

In an important case, expression (7) for dispersion can be greatly 
simplified. If the variables Xi, Xj, . . . Xn are independent in pairSy we 
see from (5) that all the correlation coefficients are = 0, so that in th^ 
case simply 

(8) D = crl + <rl+ • • • + <J’J = 5i — aj -h hi — a} -h * * • + hn — uj. 

In other words, the dispersion of a sum of variables, any two of which 
are independent, is equal to the sum of dispersions of its terms. 

8. A few examples will serve to illustrate the use of these formulas. 
Problem 7. Find the dispersion of the number of successes in series 
of n independent trials with probabilities pi, pi, . . . pn corresponding to 
first, second, . . . nth trial. 

Solution. As in Prob. 2 we associate with every trial a variable which 
assumes the value 1 or 0, according as the trial resulted in success or 
failure. These variables Xi, Xi, . . . Xn are independent because the 
trials are supposed to be independent. The number of successes 

m == Xi + Xi + • • • + Xn 

is thusu-the sum of the independent variables. To find the dispersion of 
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any one of these variables Xi we notice that 

E(x^ = 1 * p* + 0 ' g, = 

E{xD = 1 • Pi + 0 • g< = p<; 

therefore the dispersion of Xi is 

af Pi - pf = piqi 

and by (8) 

D = E(m - Pi - P2 - • * • - Pn)* = pigi -h P2g2 -f * * * + PnQn- 

In the Bernoullian case of independent trials with the same probability 
p, we have pi — p^ — • • • = p« = p and 

E(in — npy = npq. 

This formula is equivalent to the relation 

n 

T„{m — npY = npq 

wi—O 

established on page 116. 

Problem 8. In a lottery m tickets are drawn at a time out of n 
tickets numbered from 1 to n. Find the dispersion of the sum s of the 
numbers on the tickets drawn. 

Solution. Let Xi, X 2 , . . . Xm be the variables representing the 
numbers on the first, second, . . . mth tickets. By Prob. 5 we know that 

and in a similar way we find 

„ 1* + 2> 4 - • • • + n* (n + l)(2n + 1) 

E{x,) -- = g , 


whence the dispersion of Xi is 



"-vy 


n* — 1 

12 


Since we deal in the present case with dependent variables, we must 
find the correlation coefficients, or, which is the same. 


- tX" - 

for every pair of subscripts i and j. The variable x< may have any of 
the values 1, 2, 3, . . . n, with the same probability l/n;^and x, may 
have any of the same values with the exception of that assumed* by Xi 
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with the probability 


n — 1 


so that the preceding expression consists of 


terms 


n{n 




where Xj for given = 1, 2, . . . n, ranges over all numbers 1, 2, 
3, . . . n with the exception of As 

4‘ - - N-') - - ”4-0’ - 


it is obvious that 


and 


X <-1 


t n + 1 
12 


Everything now is ready for the application of (7). All simplifications 
performed, we get the following expression of the required dispersion 


D = 


m(n* — 1) 

W 




If the variables were independent, the dispersion would be 

m(n* — 1) 

12 

The dependence diminishes it, but the influence of dependence is not great 
if the ratio m/n is small. 

Problems for Solution 

1. Find the mathematical expectation M of the absolute value of the discrepancy 
m — np in a series of n independent trials with constant probability p. Am. By 
definition 


if = ^ TJ,m - 


»Pl 


where, as usual, 


n! 


m\{n — m)\ 
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But since 


we have also 


- np) = 0, 

m — O 

f = 2 T^(m - np), 

m>np 


the sum being extended over all integers m which are >np. Denoting by F{Xt y) the 
sum 


F(x. y) = 




m >np 


we have 


- np) = P~^7~ - npF(p, q). 


dp 


On the other hand, by Euler’s theorem on homogeneous functions 


^ X dF dF 

nF(p,q)^1^^+q-, 


whence 


2 -"p) = ^ »p«cI.!p^"'«*''‘- 

m >np 

Here p represents an integer determined by 

p^np + l<p-+*l. 

The answer is therefore given by the simple formula 

M = 2np9C;::!p^->g«-^. 

2. By applying Stirling’s formula (Appendix 1, page 347) prove the following 
result: 


where 


(_J_ 

® "’“* \np - l’n</- 1/ 


and n is so large as to make c ^ >fo- 
Hint: 


( J2n'pq\ d 0' 1 / 1 1 \ 

log TT ) ^ 2(np - d) ^ 2{nq - d') 24 Vp - d nq - d') 

■»(*x/¥)>-iir.F 


i>) 12(n5 — O') 4(np — 0 )' 4(^ — tf')* 

0 s .> S 1; O' + 0 = i 
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3. What is the expectation of the number of failures preceding the first success in 
an indefinite series of independent trials with the probability p? 

An«. qp + 2g»p + 3g»p + • . . = Tr^-r, = -• 

(1 - })• p 

i. Balls are taken one by one out of an urn containing a white and h black balls 
until the first white ball is drawn. What is the expectation of the number of black 
balls preceding the first white ball? 

Ans, 1. By direct application of definition the following first expression for the 
required expectation M is obtained: 


M 


= —f- 

a -f 6^0 


+ 6-1 


+ 2 


6(6 - 1 ) 


(a + 6 - l)(a + 6 - 2 ) 

6(6 - 1)(6 - 2 ) 


+ 3- 


(fl + 6 - l)(a + 6 - 2) (a + 6 - 3) 


Am. 2. However, it is possible to find a simpler expression for M. Denote by xi the 
number of black balls preceding the first white ball, by xi the number of black balls 
between the first and second white ball, and so on; finally, by Xa+t the number of black 
balls following the last white ball. We have 


and 


+ aJi + • * • + aca+i = 6 

E(xi) + E{xi) + • . . + E{xa^\) = 6 . 


But as the probability of every sequence of balls (that is, of every system of numbers 
Xi, X 2 » . • . Xa+i) is the same, namely, 


alb\ 

(a + 5)1 

it is easy to see that 


That is, 
or 


E(,xi) * Eixt) = • ' • = E{xa+i) = M. 


(a + 1)M = 6 


M 


b 

a + 1 * 


Equating this to the preceding expression for M, an interesting identity can be 
obtained, whose direct proof is left to the student. 

6 . In Prob. 6 , page 168, to determine the probability ^(n), we had an equation 


fp(n - 1 ) ip{n - 2 ) 

<f>{n) +--- +---+ 


v»(0) 


1! 


2! 


1; v?(0) = 1. 


Find the general expression for ^(n) using the method of generating functions. Am. 
Let 

Fix) = v»(0) + ^il)x + ip{2)x^ + . . . 
be the generating function of ^(n). Multiplying this series by 
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we find 


or 


whence 


e*F{x) « 1 + X + X* + • • ' 



<p{n) 


1 i 

1 ! “^ 2 ! 




(- 1 )* 

n! 


6. The total number of balls in an um is known, but the number of white balls 
depends on chance and only its mathematical expectation is known. Find the prob- 
ability of drawing a white ball. Ans. Let N be the total number of balls and M the 
expectation of the number of white balls. The required probability is M JN. 

7. Two urns contain, respectively, a white and b black and a white and /3 black 
balls. A certain number c (naturally not exceeding a -j~ b) of balls is transferred 
from the first urn into the second. What is the probability of drawing a white ball 
from the second um after the transfers? Ans. The, required probability is 


a "b j8 “b C 

8 . An um contains a white and b black balls. After a ball is drawn, it is to bo 
returned to the um if it is white; but if it is black, it is to be replaced by a white ball 
from another um. What is the probability of drawing a white ball after the foregoing 
operation has been repeated z times? Ans. Denote by Af, the expectation cf the 
number of white balls after x operations. From the equation 

the following expression for Af, can be derived: 

It follows that the required probability is 

9. Urns 1 and 2 contain, respectively, a white and b black and c white and d black 
balls. One ball is taken from the first um and transferred into the second, while 
simultaneously one ball taken from the second um is transferred into the first. W h:it 
is the probability of drawing a white ball from the first urn after such an exchange 
has been repeated x times? Ans. Let M, and P, represent the mathematical expecta 
tions of the number of white balls in the first and second urn after x exchanges. Then 

Af. + P, = a + e . - 

C -r a a -T o 
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whence 

M = (q + c)(a + b) ad - he /_1_L^V. 

a+b+c+d a+6+c+dy a+6 c -{-dj 

10. An urn contains pN white and qN black balla, the total number of balls being 
N, Balls are drawn one by one (without being returned to the urn) until a certain 
number n of balls is reached. What is the dispersion of the number m of white balls 
drawn? Ana. Let Xi — 1 if the ith ball drawn is white and Xi » 0 if it is black. 
We have 

E{xi) = p, Eim) = np, E{xX) = p 

and 

E{xi - p){xi - p) * E(xiXi) - p* = 

The required dispersion is 


D 


E{m — npY = npq 


N - n 
N - l' 


11. In a lottery containing n numbers (1, 2, 3, . . . n) m numbers are drawn at a 
time. Let x» represent the frequency of a specified number iin N drawings. Prove 
that 

E(Xi) - Np, E(xi - Npy = Npq 
E{Xi - Np)ixi - Np) * iVp(p' - p); (t 9 ^ f) 

where 


V = 


m 

— I 

n 


g = 1 - p, 


P' 


m — 1 
n - 1 


12. Let 


Zi — (Xi — Np)* — Npq. 


, Show that the dispersion of the sum 


is 


21 + 22 -H ' • ' + 


D 


2 N{N - 1) 
n — 1 


{npq)*. 


Indicaiion of the Proof. Let N variables $ 1 , £ 2 , • . • be defined as follows: 

kk — — p if in the kth drawing the number i fails to appear 
* g if in the fcth drawing the number i appears. 

In a similar way, we can define N variables 171, • • • Vff associated with the 

number j 9 ^ i. Since 

Xi — Np = $1 + €*+••• 4 - 
Xi — Np « 171 4 -172 + * • • 4 “ » 7 v 

we have 
The variables 
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being independent, we have 


But 


=* E{e^h+*nt) =* . . . =x =* 

= -|- p(l — p')e«““**» + p(l — p')g«»-»'“ -j- (g — p + pp*)e~^~v^ ** 

= F{u, V). 

Hence 


It suffices to expand both members into power series in u and v and compare terms 
involving uH;* to find 


E(ziZi); iyij. 


The rest does not present serious difficulties except for somewhat complicated calcula¬ 


tions. 

IS. A box contains 2* tickets among which CJ, tickets bear the number i (i « 
0, 1, 2, ... n). A group of m tickets is drawn; denoting by 8 the sum of their 
numbers, it is required to find the expectation E and the dispersion D of «. 

1 ..... »n(m - l)n 

Afis. Pj = -w,n\ D = -vfin -:-- 

2 4 4(2* - 1) 


14. A box contains k varieties of objects, the number of objects of each variety 
being the same. These objects arc drawn one at a time and put back before the 
next drawing. Denoting by n the smallest number of drawings which produce 
objects of all varieties, find E{n) and ^(n*). Ans. 


e(.)-*{h-!+|+ - +i) 

• SM-) - E(n)'- kf(l +5 + ■ ■ ■ + 5 ) - + 5+ ■ ■ ■ +;)• 

Use the result of Prob. 12, p. 41. 
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CHAPTER X 


THE LAW OF LARGE NUMBERS 

1. The developments of the preceding chapter, combined with a 
simple lemma due to Tshebysheff, lead in a natural and easy way to a 
far reaching generalization of Bernoulli's theorem, known under the 
name of the “law of large numbers.*^ 

Tshebysheff’s Lemma. Lei u he a variable which does not assume 
negative values, and a its mathematical expectation. The probability of the 
inequality 

u^ai^ 

is always greater than 



whatever t may be. 

Proof. Let 

Ui, U2f . . . Un 

be all the possible values of the variable u and 

Pi, P2, . . . Pn 

their respective probabilities. By the definition of mathematical expec¬ 
tation, we have 

fl) PiWi + P 2 U 2 + • * * + PnUn = a. 

We may suppose the notations so chosen that 

Wi, U2f ... ILc, 

are all the values of u which are ^at^, the remaining values 


Ua+lf Wa4-2, • . • Un 

being >at^. If all the terms in (1) with subscripts 1, 2, ... a are 
dropped, the left-hand members can only be diminished, since these 
terms are positive or at least nonnegative by hypothesis. We have, 
therefore, 

Pa+iUa+l -f- • • • -f- PnUn ^ O. 


But as 


Ui > at^ 
182 
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for i = a + If « + 2, . . . n a still stronger inequality, 

‘ + Pn) < a 


or 


Pa+l + 


H- P. 




will hold. 

Here the left-hand member represents the probability Q of the 
inequality 

u > at^ 


because this inequality can materialize only in the following mutually 
exclusive forms: either u = u a+i, or = tia+ 2 , , . . or w = tt„ whose 
probabilities are, respectively, pa+i, p«+ 2 , , , . pn- Thus 



But if P is the probability of the opposite event 


we must have 
whence 


u ^ ot®, 

P + Q = 1, 


whkh proves the lemma. 

2. Letxi, X 2 , . . . Xn be a set of stochastic variables and ai, 02 , . . . On 
their respective expectations. The dispersion of the sum 


Xi + X2 + * * • + Xn 


which we shall denote by Bn is, by definition, the mathematical expecta¬ 
tion of the variable 


w = (Xi + X2 + • • • 4- Xn - Oi - 02 - * * * - O*)*. 

Tshebysheff’s lemma, applied to this variable o, shows that the proba¬ 
bility of the inequality 

(xi + X 2 + • • • + Xn - Oi - O, - • • • - On)* ^ Bn<* 

is greater than 
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But the preceding inequality is equivalent to two inequalities 
-ty/Wn g Si + + • • • + x« - oi - oj - • • • - On ty/Wn 

or, dividing through by n, 


^ \Bn ^ Xl -f X2 + • • • + iCn Ol + O* + 

“ I —5“ ^ ' 


* • • + On ^ , iBn 
n - 


Hence, the probability of these inequalities for an arbitrary positive t 
is greater than 

i-p- 

Let € be an arbitrary positive number. Defining t by the equation 

si- 


whence 




we arrive at the following conclusion: The probability P of the inequalities 


Xi + Xi + 


■h Xn oi + 02 + • • • 4* On 


equivalent to a single inequality 


xi + x* 4- • * 

• 4" iCn Ui 4“ Ui 4" • * 

• +oJ 

n 

n 

1 


is greater than 


Thus far nothing has been supposed about the behavior of Bn for 
indefinitely increasing n. We shall now suppose that the quotient 
JBn/n* tends to 0 as n increases indefinitely. Then, having chosen two 
arbitrarily small positive numbers € and 77 , a number no can be found so 
that the inequality 

“n < V 
n*€* 


will hold for n > no. Consequently, we shall have 

P > 1 - 17 

for all n > no. This conclusion leads to the following important theorem 
due, in ^e main, to Tshebysheff: 
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The Law of Large Numbers. With the probability approaching 1 or 
certainty as near as we please^ we may expect thqt the arithmetic mean of 
values actually assumed by n stochastic variables will differ from the arithmetic 
mean of their expectations by less than arty given number, however small, 
provided the number of variables can be taken sufficiently large and provided 
the condition 

^ > 0 as n-^ cc 
n^ 

is fulfilled. 

If, instead of variables Xi, we consider new variables Zi = Xi ~ a< 
with their means = 0, the same theorem can be stated as follows: 

For a fixed e > 0, however small, the probability of the inequality 


\zi 4" 22 4- 


4- 2n| 


< 6 


tends to 1 as a limit when n increases indefinitely, provided 




This theorem is very general. It holds for independent or dependent 
variables indifferently if the sufficient condition for its validity, namely, 
that 

—^ ► 0 as n —> 00 
n^ 

is fulfilled. 

3. This condition, which is recognized as sufficient, is at the same 
time necessary, if the variables Zi, Z 2 , . . . Zn are uniformly bounded; 
that is, if a constant number (one independent of n), C, can be found 
so that all particular values of z*(t = 1, 2, . . . n) are numerically less 
than C. Let P, as before, denote the probability of the inequality 

|zi 4" 22 4" * * * 4" 2 n| ^ ne. 

Then the probability of the opposite inequality 

|2i 4” 22 4" * * * 4" 2n| > ne 

will be 1 - P. 

Now, by definition. 

Bn — E{z\ 4" 22 4“ * * • 4* 2„)* 
whence one can easily derive the inequality 


Bn < n*C2(l - P) 4- w*6*P 
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from which it follows that 

^ < C\1 -P)+ t^P < + C»(l - P). 

n* 


If the law of large numbers holds, 1 — P converges to 0 when n 
increases indefinitely, so that the right-hand member for sufficiently 
large n becomes less than any given number, and that implies 


n* 


0 , 


which proves the statement. 

4. There is an important case in which the law of large numbers 
certainly holds; namely, when variables Xi, X 2 , . . . Xn are independent 
and the expectations of their squares are bounded. Then a constant 
number C exists such that 

hi = P(x?) <C for i = 1, 2, 3, . . . . 

On the other hand, for independent variables 

n n 

Bn = X(bi- «»,) g < nC 
and 



The expectations of squares are bounded, for instance, when all the 
variables are uniformly bounded, which is true, for instance, for “iden¬ 
tical^^ or “equar' variables. Variables are said to be identical if they 
possess the same set of values with the same corresponding probabilities. 

6 . E. Czuber made a complete investigation of the results of 2,854 
drawings in a lottery operated in Prague between 1754 and 1886. It 
consisted of 90 numbers, of which 5 were taken in each drawing. From 
Czuber^s book “Wahrscheinlichkeitsrechnung,” vol. 1, p. 141 (2d ed., 
1908), we reprint the table shown on page 187. 

With the 2,854 drawings, we associate 2,854 variables, Xi, iC 2 , . . . a;2864 
representing the sum of five numbers appearing in each of the 2,854 
drawings. These variables are identical and independent with the 
common mathematical expectation 227.5. Hence, by the law of large 
numbers, we can expect that the arithmetic mean of actually observed 
values of these variables will not notably differ from 227.5. To form 
the sum 

28M 

S = 



SBC. 5] 


THE LAW OF LARGE NUMBERS 


187 


Numbers 

Their frequency 
m 

Difference 
m - 158 

6 

138 

-20 

39, 65 

139 

-19 

16, 41, 76, 87 

142 

-16 

2, 14, 56, 79, 86 

143 

-15 

18, 44, 47 

144 

-14 

72, 80 

145 

-13 

12 

146 

-12 

21, 53 

147 

-11 

70 

149 

- 9 

24, 32, 55, 69 

150 

- 8 

27, 64, 75 

151 

- 7 

81 

152 

- 6 

23, 29, 85 

153 

- 5 

19, 35, 42, 74 

154 

- 4 

7, 20, 59 

155 

- 3 

13, 34, 40, 67, 88 

156 

- 2 

11, 52, 68 

157 

- 1 

17, 82 

158 

0 

15, 90 

159 

1 

58 

160 

2 

8, 25, 36 

161 

3 

22 

162 

4 

33, 57 

163 

5 

51 

164 

6 

3, 43, 45, 48 

165 

7 

10, 26, 66 

166 

8 

1, 5, 60, 84 

167 

9 

50, 62 

168 

10 

9, 61, 63 

170 

12 

54, 73 

171 

13 

49, 71, 78 

172 

14 

28 

173 

15 

37 

176 

18 

30, 46 

177 

19 

89 

178 

20 

31 

179 

21 

38 

184 

26 

4 

185 

27 

77 

186 

28 

83 

189 

31 


we must multiply the frequencies given in the preceding table by the 
sum of corresponding numbers. To simplify the task we notice that all 
numbers from 1 to 90, actually appeared. Hence, we multiply the 
sum of these numbers, 4,095, by 158, which gives: 

4095 • 158 = 647,010, 
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and then add to this number the sum of the differences w ~ 158 multi¬ 
plied by the sum of the numbers in the seme line. The results are: 


Hence 

and 


Sum of positive products 
22,336 


Sum of negative products 
-19,587. 


S = 647,010 + 22,336 - 19,587 = 649,759 


S 

2854 


227.67, 


which differs very little from the expected value 227.5. An even larger 
difference would be in perfect agreement with the law of large numbers 
since 2,854, the number of variables, is not very great. 

6. The two experiments reported in this section were made by the 
author in spare moments. In the first experiment 64 tickets bearing 
numbers 0, 1, 2, 3, 4, 5, 6 and occurring in the following proportions: 


Number. 

0 

1 

2 

3 

4 

5 

6 


Frequency. 

1 

6 

15 

20 

15 

6 

1 



were vigorously agitated in a tin can and then 10 tickets were drawn at a 
time and their numbers added. Altogether 2,500 such drawings were 
made and their results carefully recorded. From these records we 
derive Tables I and II. 


Tabl£ I 


A. 


Number 

Frequency observed 

Expected frequency 

Discrepancy 

0 

404 

390.625 

+ 13.375 

1 

2,321 

2,343.75 

-22.75 

2 

5,850 

5,859.375 

- 9.375 

3 

7,863 

7,812.5 

+50.5 

4 

5,821 

5,859.375 

-38.375 

5 

2,344 

2,343.75 

+ 0.25 

6 

397 

390.625 

+ 6.375 


The next table gives the absolute values of differences s — 30 where s 
is the sum of the numbers on 10 tickets drawn at one time, and their 
respective frequencies. 

From Table I it is easy to find that the arithmetic mean of all 2,500 
sums observed is: 


74996 


= 29.9984 


2500 
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Table II 


|.-30| 

Frequency observed 

|s - 30| 

Frequency observed 


246 

7 

71 


549 

8 

44 


479 

9 

25 


379 

10 

8 


324 

11 

4 

5 

241 

12 

1 

6 

129 




whereas the expectation of each of the 2,500 identical variables under 
consideration by Prob. 13, page 181, is 30. By the same problem the 
.dispersion of s, that is, E(8 — 30)’ is 12.857. On the other hand, from 
Table II we find that 


’ and 


2(s - 30)’ = 31477 


2(8 - 30)’ 
2500 


= 12.5908 


fairly close to 12.857. 

In the second experiment we tried to produce cards of every suit in n 
drawings (n being the smallest number required) of one card at a time, 
each card taken being returned before the next drawing. By Prob. 14, 
page 181, we find that the expectation and the dispersion of this number 
n are, respectively, 8)4 14.44. Altogether 3,000 values of n were 

recorded, of which 33 was the largest. Values of the difference n — 8 are 
given in Table III. 


Table III 


n -8 

Frequency 

n - 8 

Frequency 

n - 8 

Frequency 

-4 

282 

6 

77 

16 

3 

~3 

420 

7 

50 

17 

5 

-2 

426 

8 

40 

18 

2 

-1 

407 

9 

31 

19 

1 

0 

348 

10 

17 

20 

3 

1 

247 

11 

15 

21 

1 

2 

228 

12 

13 

22 

1 

3 

156 

13 

6 

23 

1 

4 

116 

14 

9 

24 

0 

5 

88 

15 

6 

25 

1 


From this table we find 

2 (n ~ 8) = 965, 2(n - 8)* =« 43,395, 























190 INTRODUCTION TO MATHEMATICAL PROBABILITY [Chap. X 


whence 

2;(n - 8i)* = S(n - 8)* - jS(n - 8) + = 43,085 

Sn = 24,965. 

By the law of large numbers we may expect that the quotients 

Sn S(n - 8i)* 

3000 3000 

will not considerably differ from 83^ and 14.44, respectively. As a 
matter of fact. 


= 8.322, 


= 14.362. 

3000 


There is a very satisfactory agreement between the theory and this 
experiment in another respect. Of 24,965 cards drawn there were 


6,304 hearts 
6,236 diamonds 
6,131 clubs 
6,294 spades 

whereas the expected number for each suit is 6241.25. 

7. So far, we have dealt with stochastic variables having only a finite 
number of values. However, the notion of mathematical expectation, 
and the propositions essentially based on this notion, can be extended to 
variables with infinitely many values. Here we shall consider the 
simplest case of variables with a countable set of values, that can be 
arranged in a sequence 

L 

• • • < a_2 < a-i < ao < ai < of2 < ' * * 

in the order of their magnitude. 

With this sequence is associated the sequence of probabilities 


. . . , p_2, p_i, po, Pi, P2, . . . 

so that in general pi is the probability for x to assume the value a<. 
These probabilities are subject to the condition that the series 

2 p» = • • • + p_2 + P-i + Po + Pi + p2 + • * * 
must be convergent with the sum 1. 

The definition of mathematical expectation is essentially the same 
as that for variables with a finite number of values, but instead of a 
finite sum, we have an infinite series 

E{z) = Spioti 

provided this series is convergent (it is absolutely convergent, if con¬ 
vergent at-p.ll). If this series is divergent, it is meaningless to speak of 
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the mathematical expectation of z. Likewise, the mathematical expec¬ 
tation of any function ip{x) is defined as being the sum of the series 

Elfp(z)} = 

provided the latter is convergent. 

It can easily be seen that various theorems established in Chap. IX, 
as well as Tshebysheff’s lemma, continue to hold when the various mathe¬ 
matical expectations involved exist. 

The law of large numbers follows, as a simple corollary, from Tsheby- 
sheff’s lemma if the following requirements are fulfilled: 

а. Mathematical expectations of all variables Xi, Xs, xs, . . . exist. 

б . The dispersion Bn of the sum Xi -f- Xj + • • • + Xn exists, 
c. The quotient Bn/n} tends to 0 as n tends to infinity. 

The first requirement is absolutely indispensable. Without it the 
theorem itself cannot be stated. The second requirement (not to speak 
.of the third) need not be fulfilled; and still the law of large numbers may 
hold, as Markoff pointed out. 

8. Let Xi, Xi, X|, . . . be independent variables. If for every i 
the mathematical expectation 

exists, the quantity Bn exists also. But if at least one of these expecta¬ 
tions does not exist, the quantity Bn has no meaning. However, the 
following theorem, due to Markoff, holds: 

Theorem. The law of large numbers holds, provided that for some 
6 > 0 aU the mathematical expectations 

< = 1,2,3,... 

exist and are hounded. 

Proof. For the sake of simplicity we may assume that 
E{xi) =0; i = 1, 2, 3, , . . . 

For, supposing 

E{Xi) = a.*; i ~ 1, 2, 3, . . . 
instead of x<, we may consider new variables 


Then 


= x< — a<. 


E(jZi) * 0 


and it remains to prove the existence and boundedness of 


<-1,2,3,- 
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The proof follows immediately from the inequalities 


the first of which is well known; the second is a particular case of Lia- 
pounoff’s inequality, established in Chap. XIII, page 265. 

Thus, from the outset we are entitled to assume that 

E(xi) = 0. 

The proof of the theorem is based on a very ingenious and useful 
device due to Markoff. Let iV be a positive number which later we shall 
increase indefinitely. Together with we shall consider two new varia¬ 
bles, Ui and Vi, defined as follows: a being a particular value of the 
corresponding values of u, and Vi are 

Ui = a, Vi = 0 

if \a\ ^ N and 

= 0, Vi — a 


if tat > N. Thus, stochastic variables and Vi are completely defined. 
Evidently 

Xi = Ui 4 - Vi 

whence 

0 = E{ui) + E(Vi) 

and 

Pi E{ui) = — £f(v<). ^ 

Now 


by hypothesis. Since Vi is either 0 or its absolute value is >N, we have 
N*E(M) ^ EQvil^^^) < c, 

whence 


(2) N = < jf.- 

likewise, the probability qi for Vi 0 satisfies the inequality 
N^+*qi g Edvil^^*) < c, 


whence 
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Now, let us consider two inequalities 

< <r 

< a 

where a is an arbitrary positive number and let Po and P be their respec¬ 
tive probabilities. The inequalities (4) and (5) coincide when 

^1 = ^2= • • • = Vn = 0. 

With this supplementary condition they have the same probability Q. 
But they can hold also when at least one of the numbers 

V\y t;2, . . . Vn 

is different from 0. Let the probabilities of (4) and (5) under such 
circumstances be Rq and R. Then 

Po = Q + Po, P = 

But evidently neither Rq nor R can exceed the probability that in the 
series 

vu t^2, . . . Vn 

at least one number is different from 0; this probability in turn does not 
exceed (see Chap. II, page 30) 



^1 + 92 + ’ * * 

1 ^ 

^ 

Hence 



f 

Ho < 


and 



(6) 

1^* - Po\ 

nc 

< 


On the other hand, since none of the values of = 1, 2, . . . n) 
exceeds iV, we have 

E{u\) ^ ^ 

Accordingly, the dispersion of the sum Wi + W 2 + * * * + Wn will be 
less than 

cnN^~*. 

Hence, by what has been proved in Sec. 2, the probability of the ine¬ 
quality 


1/1 4- W2 + • • 

• 4“ l/n 4- Z?! 4- * * 

• 4- Pn 

n 

n 



(4) 

(5) 


1/14-1/2 4" * * 

* 4“ l/n 

n 


Xi 4- Xa • • 

4" Xr 

n 1 


(7) 
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is greater than 


1 - 




But whenever (7) is satisfied, the inequality 
( 8 ) 


ni -f Mj -f • • 

• 4- Un 

n 



^ , liSl + 02 4- * ‘ * -f" /3n| 

- 2 n 

is also satisfied. Hence, the probability of this inequality is a fortiori 
greater than 


1 - 




Owing to inequalities (2), the following inequality follows from (8): 


Ml 4" M2 4" * • 

• + Mn 

n 



Hence 


and on account of (6) 


Po> 1 - 


< 2 + F‘ = 




P > 1 


4cisri- 


nc 


€*n 

Now we can dispose of the arbitrary number N by taking 


N 

^ 2 


Then 


P > 1 


-Kr- 


Now ^ tends to infinity with n and as soon as n surpasses a certain 
limit no, the fraction 

c 

N* 

will become and remain less than c/2. The probability of the inequality 

< € 


4" 4" • ’ 

* + Xn 

n 



for n > no will be greater than P and consequently greater than 
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It tends, therefore, to 1 as n tends to infinity, and that proves Markoff’s 
theorem. 

Example. Let the possible values of the variable Xpip « L 2, 3, . . . ) be 
p-Kp + l)i, p-Kp + 1)*, p-Kp 4-1)1, . . . 
with the corresponding probabilities 


Since the series 


P P _ P 

p 4- l’ (P + !)•* (P 4- !)•* ' ‘ * 




P 


4 - . . . 


is divergent, the mathematical expectation 


E(xi) 

does not exist. Yet the law of large numbers holds. For 




-2 




is a convergent series for any 0 < d < 1. Moreover, 


-2 




1 

- 1-6 ' 
2 2-1 


and consequently the conditions of Markoff’s theorem are satisfied for any 0 < 5 < 1. 
Hence, the law of large numbers holds in this example. 

9. If variables Xi, X 2 , xs, . . . are identical, the law of large numbers 
holds without any other restrictions, except that for these variables mathe¬ 
matical expectations exist. In fact, Khintchine proved the following 
theorem: 

Theorem. //, as we may naturally suppose^ E{Xi) = 0, the probability 
of the inequality 


-f X2 -f- - • ’ 4- X n 
n 


^ € 


tends Ig \ as n increases indefinitely. 

Proof. The proof is quite similar to that of Markoff^s theorem and 
is based on the same ingenious artifice. Let 


• * • < a_2 < a_i < ofo < «2 < • • • 

be different values of any one of the identical variables Xi, X 2 , xs, . . . and 
. . . , p_2, p-i, po, Ph P*, • • • 
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their probabilities. By hypothesis 

is a convergent series with the sum 0. The series 

is also convergent; let c > 0 be its sum. 

Keeping the same notations as before, we have 

W £ £(1«4) = X 

|«<| >N 

where yl/(N) is a decreasing function tending to 0 as JV —> qo . Also 
E(u\) ^ NE\xi\ = cN 
so that the dispersion of the sum 

1/1 4* W2 + • * * + 

is less than 

cNn. 


Consequently the probability of the inequality 


(9) + 

• Un Pi -h 02 ' 

' + /3n ^ € 

n 

-2 

is greater than 

- AcN 



On the other hand, the probability g, of the inequality v, ^ 0 is less 
than 

HN) 

N 

because 

nx p<< 

\ai\ >N 


la<| >N 

Hence, the difference between the probability of the inequality 
| t/i "h + • • • + uJ ^ ^ 


n 
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and that of the inequality 

[ xi + + ■ • • + xJ 

I ^ I 

is numerically less than 

n4^{N) 

N 


As in the preceding section we conclude that the probability of the 
inequality 


is greater than 


4- Uj + * 

* 4* y>n 

n 



^ I -f HN) 


4cN 

c*n 


’ Finally, the probability of the inequality 


( 10 ) 


iTi + 4- * * 

* 4- Xn 

n 



^ 5 


is greater than 


icN n4^(N) 
€*n N 


To dispose of N we observe that the ratio 

y/lW) 

N 

is a decreasing function of N and tends to 0 as «. 
for large n, there exists an integer N such that 


Then 


V^) < Vhn - 1) 

N ^ tn - N -1 


Hence, at least 




whence it follows that the probability of inequality (10) is greater than 

1 - - 1 )]- 

Now N increases indefinitely together with n; therefore, for all n 
above a certain limit n©, 
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90 that for n > no the probability of the inequality 

a?! + + • • • "t“ ^ ^ 

n I 

will be greater than 

1 - - 1)] 

and with indefinitely increasing n will approach the limit 1. Thus 
Khintchine’s theorem is completely proved. 

Example. Let 

2*”*<»*, 2*"*««*, . . . . . . 
be all possible values of identical variables xi, xi, xa, . . . and 

^ 1 i i 

? 2 *’ ‘ * 2 *’ ‘ ‘ * 

their corresponding probabilities. Since the series 

_i_ + _L . =.i+_L+_L+... 

2siaci ~ 2*‘««* 2***^ 3**** 

is convergent, mathematical expectations of the variables xi, Xa, X|, . . . exist. 
Hence, the law of large numbers holds in this case. 

Markoff’s theorem cannot be applied here, because for any positve 6 the series 

2»* 

1 • 

is divergent. 

Problems for Solution 

1. Let X be a stochastic variable with the mean » 0 and the standard deviation a. 
Denoting by P{t) the probability of the inequality 

X ^ t 

show that 

PH) ^ for < < 0 

' <r* -f P 

1 - P(l) g for < > 0. 

4* P 

Show also that the right-hand members cannot be replaced by smaller numbers. 
Indication of the Proof. Since 

Sp<x< * 0, SpixJ = a*, 

we have also 

. ^Pi(xi - 0 “ -f, 2p<(x< ~ 0* - 
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whence, supposing that > / for i « 1, 2, . . . « and first taking i negative, 


»-i 


^ - 0[ ^ - ty ^ (1 - P«))(a* + <*) 

) t-i »-i 
<* 

1 - Pit) ^ — 

a* ■ 


<r* 4- P’ 


Pit) ^ 


(T* + P 


For positive t the proof is quite similar. Considering a stochastic variable wilh 
two values: 

P.=;rf7. 

<r« i* 

** “ "7’ 

one can easily prove the last part of our statement. 

2. Tshebysheffs Problem^ If 2 is a positive stochastic variable with given 

Eix) = <r*, E(X*) = 

then the probability P of the inequality 

X ^ V 

has the following precise upper bounds: 

P ^ 1 for V < ff* 

P ^ — for V* ^ V < -- 


P ^ 


-f V* - 2 <r*» 


for V ^ • 


Indication of the Proof. Let 


Then f < » if » ^ tVv* and 


{ = 


a*v -r* 
V — <r* 


P ^ El 


for X ^ V. On the other hand, 


whence 


J x - t V » - 2c>i + t« _ ->« 

\i'- {/ («’-{)’ 




■ r« -f r* - 2ir«i; 

*■ 8ur les valeurs limites des inreKvaics. Jour. Liotwille, Ser. 2, T. XIX, 1874. 
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The equality sign is reached for the stochastic variable with two values: 


II 

(V - «r*)> 

-h P* - 2<r*P 

Xt = V, 

r* - 

P* “ + i>« - 2ffV' 


If a* ^ < T^/<r* we have an obvious inequality 


<• s *(;) 



V 


To show that the right-hand member cannot be replaced by a smaller number, con¬ 
sider the following stochastic variable with three values: 


Xi 

saa n 

Pi 


- a*)v — ior* -f- T* 

S» iff 


Iv 




la* 

- T* 

Xs 

* Pf 

pi 

" v{l 

-P) 




T* 

- a*v 

X» 

— 

Pi 




where 2 > v is an arbitrary number. For this variable 


T* — cth> 

P « Pj + p, -- - - 

V h) 


is arbitrarily near to for sufficiently large 1. 

8. If X is an arbitrary stochastic variable with given 

E{x') = (r», E{X*) » 

and P denotes the probability of the inequality 

|xl ^ ktr, 

then 



These inequalities cannot be improved. 

Hint: Follows from Tshebysheff’s problem. 

4 . Let Xi assume two values, i and —i with equal probabilities. Show that the 
law of large numbers cannot be applied to variables Xt, xtt X3, . . . . 

6. Variables Xi, Xi, Xi, . . . each assume two values: 
log a or -logo; log (a + 1) or -log (o -f 1); log (o -f 2) or -log (o -f- 2); • • • 
with equal probabilities. Show that the law of large numbers holds for these vari- 
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Hint: E{xi) = 0; i = 1, 2, 3, . . . 

Bn = E{Xi Xt ’ + Xn)* 

n-1 

= {log (a + »)}• ^ (a -f n - l)|log (a + n - 1)|* 
<-0 

as can easily be established by using Euler’s summation formula (Appendix 1, page 
347). Hence 



6. If Xi can have only two values with equal probabilities, i® and show that the 
law of large numbers can be applied to xi, Xt, Xt, ... if a < 

Hint: 

«1«+1 O 1 

Bn == + • • • + » ~-^0 if a < -• 

2cif 4- 1 n* 2 

’ It can be shown that the law of large numbers does not hold if a 

7. In an indefinite Bernoullian series of trials with the constant probability p, 
let mi denote the number of successes in the first % trials. Show that the law of large 
numbers holds for variables 


mi — ip 
{ipq)<* * 


t = 1, 2, 3, . . . 


if a > H* 

Hint: Evidently F{xt) = 0, Eix*) « and 


Now 


B. = + 2^K(x,Xi). 

*=1 j>i 


E{xiX,) = {ij)'^{pq)-^E{mi - ip)* 4- (v)'^(p9)-*“B| (m. - ip){m,-mi- (j- i)p)| =: 
since m< — ip and — mi — (J ^ *)p are independent variables. Thus 


B. = (p?)'-“[ 2) ] 

'» = 1 j>i 


and it is easy to show that 



aS 


n -♦ «o 


provided a > But the law of large numbers no longer holds if a ^ The 
proof of this is more difficult. 

8. The following extension of Tshebysheff’s lemma was indicated by Kolmogoroff. 
Let Xi, Xt ... Xn he independent variables; B(x,) = 0, Eix]) = 6i, 

Bn = 4" 4" * • * 4" h»» 



202 INTRODUCTION TO MATHEMATICAL PROBABILITY [Cbap. X 


and 

»* * *1 + *1 + * • • + **; A: * 1, 2, ... n. 

Denoting by P the probability of the inequality 
(il) max. («5, a5, . . . «1) > 

we shall have P < l/tK 

Indication of the Proof. The inequality (A) can materialize if and only if one of 
the following mutually exclusive events occurs: 


event ei: > BnP; 



event eti sj ^ Bj*; 

> B.t‘; 


event e»: s] S Bjt*\ 

oj i B.I*; s* > BJ>; 


event Sni s] ^ Bnt*; 

t\ S BJ*; ■ ■ ■ »L. i BU*; 

s; > Bj‘. 


If (Ct) represents the probability of e,(t = 1, 2, . . . n) then 
P — («l) + («») + • • * + («n). 

Now consider the conditional mathematical expectation i!?(s*|ek) of a* given that 
ek has occurred. Since the indication of ek does not affect variables Xk+u Xk^t, . . . Xnt 
these variables and s* are independent. Hence 

■^•(*111^*) ■^(**1®*) + + * * * + bn > Bnt*. 

On the other hand 


B. - E{sl) = > S««*l(«.) + (e.) + • • • + (e.)) 

k»l 


whence P < i/i*. 

9. The Strong Law of Large Numbers {Kolmogoroff). Using the same notations 
as in the preceding problem, show that the probability of the simultaneous inequalities 


Sn 

n 




Sn+I 

n 4- 1 




^nf 2 

n + 2 




will be greater than I — Vi provided n exceeds a certain limit depending on the choice 
of « and ij, and granted the convergence of the series 




Indication of the Proof. Consider variables 

l®m 


for Hi - 2<->n ^ m < 2<n; t - 1, 2, 3, 


and denote by the probability of the inequality ti > }it. By Kolmogoroff's 
lemma 


9i < 


4 ^ 4i 
2»<“*n»€* 
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and 


«i -I- H- 9i + • • • 


•0 ^ f-2<n-l 

i-l /-2*->n 


• /-2<n-l 

<16€-*5) X 


i-l i-2<-in 


Z* 


or 

« 

9i + + 9* + ' * * < ji^*' 

* —n 

Hence, the probability of fulfillment of all the inequalities n ^ t - 1» 2, 3, . . . 
is greater than 



k"^n 


The inequalities |«*/A:| ^ A; * n, n -f 1, n + 2, 
taneously 

Ti ^ ~ 2, 3, 

•and 


8n—1 

n 


^1.. 

2 


. . are satisfied when simul- 


4B, 

The probability of the last inequality being greater than 1-—» the probability 

n*€* 

of simultaneous inequalities 

^ ^ t; A; = n, n + 1, n + 2, . . . 
lA; 

a fortiori will be greater than 



k^n 


This inequality suffices to complete the proof if we notice that B»/n’ tends to 0 when 
the series 



ib-l 


is convergent. 

10. Let xi, xs, . , . Xn be identical stochastic variables and E{xi) 
by Pn(t) and Pn(e), respectively, the probabilities of the inequalities 


0. Denoting 


xi + X* 4- • • • + ^ . xi 4- xi 4- • • • + x« ^ 

- > < and - < 

n n 

show that 

r n 

lim -= 0 or 4- 

n - 


according as E(x\) > or <0. 
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For the proof see Khintchine’s paper in Mathematiaehe Annalen (vol. 101, pp. 381- 
385). 

11. The Law of the Repeated Logarithm {KHrUchinej Kolmogoroff). Let Zi, «tt 
Xn he bounded independent variables, E(xi) = 0, t ■« 1, 2, . . . n and B* —► 
as n 00 . For an arbitrarily small 5 > 0 and e > 0 and for an arbitrarily large 
one can choose no > N ao that: 

o. The probability of the fulfillment of the inequality 

|*.| >(1 + 8)^28. log log B. 

for at least one n ^ no is less than c. 

6. The probability of the fulfillment of the inequality 

M > (1 - 5)\/2Bn log log Bn 

for at least one n ^ no is greater than 1 — e. 

For the proof see Kolmogoroff’s paper in Mathematisehe Annalen (vol. 101, pp. 126- 
135). 

If Zi, Zs, . . . Zn are variables independent in pairs and Bn the dispersion of their 
sum « « zi -f Zj + * • • + ajn, then the probability P that 


|«| S tVBn 

satisfies the inequality 

P > 1 — i (Tshebysheff’s inequality) 

provided E(xi) = 0, t = 1,2, . , , n, which can be assumed without loss of generality. 
In case variables are totally independent and are subject to certain limitations of com¬ 
paratively mild character, S. Bernstein has shown that Tshebysheff's inequality can be 
considerably improved. 

12. Let Zi, z*, . , . Zn be totally independent variables. We suppose B(z,) - 0, 
B(z5) * hi and 


for » = 1, 2, . . . n and h > 2, e being a certain constant. Show that 

Bnt» 

where o is an arbitrary positive number < 1 and c is a positive number so small that 
cc O’. 

Indication of the Proof. We have 


i*<i* 


s“* s 1 + «< + 

n-2 


£ 1 



whence 


8 
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13. If Q denotes the probability of the inequality 

*.+*,+ ••• +7 

show that Q < 

Indication of the Proof. If Q is the probability of the inequality 

• • • +,,) ^ 

then, by Tshebysheff's lemma, Q < e~** and Q < Q by Prob. 12. 

14. S. Bern8lein*8 Ineqwdity. Denoting by P the probability of the inequality 

jXi + + • * * 4" 35»1 ^ u», 

u being a given positive number, show that 

P > 1 - 2e~'2B^+2eu, 


Bnt t* t\^2{l - a) 

Indication of the Proof. To make --- -{- - - F minimum take c =--: 

2(1-<r) < 

and t is determined by equating /^•to w. The resulting value of «, 


then F ^ t 


I 2Bn 


. = —(1 _,) 


is admissible only if €C ^ <r or ^(1 — <r) ^ <r. The best choice for <r is <y * — 

Bn Bn c*j> 


and correspondingly t 


4" 2^ 
4- + 


By Prob. 13 the probability of the inequality 
• • • 4" Jfn > w 


is less than e 2fli.-f2c« same is true of the probability of the inequality 

xi 4- a;* 4- • • • 4- < -w or —xi - xj - • • • —x» > w. 

16. If variables Xi, Xa, . . . Xn arc uniformly bounded and M is an upper bound 
of their numerical values, then we may take c = M/3. 

Indication of the Proof. Note that 




Hfi"- 


16. Consider a Poisson’s series of trials with probabilities pi, pi, . . . p» for an 

Pi ”1- pj p„ 

event E to occur. Let m be the frequency of E in n trials, p =-» 


X « -(pi^i 4" Pa^i 4- 
n 


4- Vn^n). Show that the probability P of the inequality 


~ p ^ c has the following lower limit: 


lu' 


P> \ - 2* 
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In the Bernoullian case pi « pi - • • • ■*?»» PQ and consequently 

nt* 

P > 1 - 2e 2pfl+^ 


17. An indefinite series of totally independent variables xi, . .has the 

property that the mathematical expectations of any odd power of these variables is 
rigorously = 0 while 




\ 2 / *' ’ 


bi - B(*J) 


for t « 1, 2, 3, . . . . Prove that the probability of either one of the inequalities 
®i + *» -f • • • 4- * 1 . > lV25[ or xi 4* aca 4- • • • 4* *• < - ty/Wn 
where P* * 6i 4- ht 4- • • • 4- 6* ia less than «“** (S. Bernstein). Prove first that 

E(e^i) ^ « 2 

18. Positive and negative proper decimal fractions limited to» say, five decimals, 
are obtained in the following manner: From an um containing tickets with numbers 
0, 1, 2, ... 9 in equal proportion, five tickets are drawn in succession (the ticket 
drawn in a previous trial being returned before the next) and their respective numbers 
are written in succession as five decimals of a proper fraction. This fraction, if not 
equal to 0, is preceded by the sign + or according as a coin tossed at the same time 
shows heads or tails. Thus, repeating this process several times, we may obtain as 
Uiany positive or negative proper fractions with five decimals as we desire. What 
can be said about the probability that the sum of n such fractions will be contained 
between prescribed limits — u and w7 Ana. These n fractions may be considered as 
so many identical stochastic variables for each of which 


(1 - 10“»)(2 - 10 "‘) 1 
- 0 , /3 * Eix*) - ^^^ < 5 . 


Besides, 




B{z») : ^ 


10 '»+‘ 2 * + !• 


since in general 


1** 4- 2» 4“ • • • 4- (« - 1)“ < 


2* 4-1 


Again, the inequality 




can easily be verified and we can apply the result of Prob. 17. For the required 
probability P the following lower limit can be obtained: 

_8w« 
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or, if w » nc 

-IlM* 

P > 1 -2e 

For example, if c » n ^ 814, 

P > 0.99999, 

that is, almost certainly the sum of 814 fractions formed in the above described man- 
ner will be contained between —82 and 82. 
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CHAPTER XI 

APPLICATIONS OF THE LAW OF LARGE NUMBERS 

1. A theorem of such wide generality as the law of large numbers is a 
source of a great many important particular theorems. We shall begin 
with a generalization of Bernoulli’s theorem due to Poisson. 

Let us consider a series of independent trials with the respective 
probabilities pi, P2, Pa, • . . , varying from one trial to another. Con¬ 
sidering n trials, we shall denote by m the number of successes. The 
arithmetic mean of probabilities in n trials 

_ Pi + P2 + • • ‘ + Pn 
^ n 

will be called the ^'mean probability in n trials.” With such conditions 
and notations adopted, we can state Poisson’s theorem as follows: 

Poisson’s Theorem. The probability of the inequality 


for fixed e > 0, no matter how smallj can be made as near to 1 {certainly) as 
we pleasCj provided the number of trials n is sufficiently large. 

Proof. To show that this theorem is but a particular case of the law 
of large numbers, we use an artifice often applied in similar circun}- 
stances, namely, we associate with trials 1, 2, 3, ... n variables Xi, 
X 2 , xs, . . . Xn defined as follows: 

Xi = 1 in case of success in the ith trial, 

Xi = 0 in case of failure in the ith trial. 

Since the trials are independent, these variables are also independent. 
Moreover 

E{Xi) = E{x^) = Pi 

and the dispersion of Xi is 

Pi - Pi == PiQi- 
The dispersion Bn of the sum 

Xi + X2 + ' • • + Xn 

is the sum of the dispersions of its terms, that is, 

= PlQl + P^2 4- * * * + PnQn ^ 

At the same time, the former sum represents the number of successes nk 
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Now, appljdng the results established in Chap. X, Sec. 2, we arrive 
at this conclusion: Denoting by P the probability of the inequality 


we shall have 


m 

n 


P\ 




^1- 


1 

4n€* 


It now suffices to take 


to have 



P >1 -rf 


where n is an arbitrary positive number no matter how small. That 
completes the proof of Poisson^s theorem. 

Evidently Bernoulli’s theorem is contained in Poisson’s theorem as a 
particular case when 


Pi = Pi = • • • = p« = p. 

Poisson himself attached great importance to his theorem and adopted 
for it the name of the ^Maw of large numbers,” which is still used by many 
authors. However, it appears more proper to reserve this name to the 
theorem established in Chap. X, Sec. 2, which is due to Tshebysheff. 

2. Let us consider n series each consisting of s independent trials with 
tjie constant probability p. Also, let 

mi, m2, . . . rrin 

represent the number of successes in each of these s series. Stochastic 
variables 


Xi = (mi — spy, X2 = (m2 — spy, • • • = (mn — spy 

are independent and identical. Their common mathematical expecta¬ 
tion is spq. The law of large numbers can be applied to these variables 
and leads immediately to the conclusion: The probability of the inequality 


- spy 


- spq\ 


< € 


can be brought as near as we please to 1 (or certainty) if the number of 
series n is sufficiently large. 
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Substituting €spq for c and dividing through by Bpq^ we may state the 
same proposition aa follows: The probability of the inequalities 


1 - € < 


^(»n< - «p)* 


t-1 


Npq 


< 1 + «, 


where N = ns is the total number of trials in all n series, can be brought 
as near to 1 as we please if the number of series is sufficiently large. 

The law of large numbers can be legitimately applied to the variables 


Xi = jm. - sp|; t = 1, 2, 3, . . . 


with the common mathematical expectation 


where — [sp + 1], and leads to the following proposition: The proba¬ 
bility of the inequalities 

n 

- «p| 


can be brought as near to 1 as we please if the number of Series is suf¬ 
ficiently large. 

For the sake of simplicity, let us use the notations 


- «p)* 

= - 

n 

n 

- «p| 

B = - 

The probabilities P and P' of the inequalities 


(1) ^/^(l — o) < A < y/»pq(,l + <r) 

(2) M.(l - <r) < B < + a) 

which are equivalent to 

n 

- spy 

(1 _ „y < izl - < (1 + <r)* 

nspq ' ' 

n 

Xi"** “ 
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can both be made greater than 1 — where ly is an arbitrarily small 
positive number. The probability of simultane 9 us materialization of 
(1) and (2) is not less than 

P + P' - 1 > 1 - 2,,. 

But whenever (1) and (2) hold simultaneously, we have 

y/spql - a ^ A ^ y/apql + a 
AT. 1+ <r M. 1 - <r' 


Therefore the probability of these inequalities is again >1 — 2 t;. 
let us take 


2 -f" T 


where r is another positive number arbitrarily chosen. Then 


1 

1 - <r 

Hence, the inequalities 

Vm 


= I+t; 


1 — <r 
1 + 


> 1 — T. 




Now 


follow from inequalities (3) and their probability is a fortiori > 1 — 2Ty. 
It suffices to take 


T = 



to*arrive at the following proposition: 
The probability of the inequality 


A 

B 


M. 


< t 


for a fixed e and sufficiently large number of series can be made as near to 
1 as we please. 

If spq is somewhat large, the quotient 


M. 


differs but little from (see Chap. IX, Prob. 2, page 177). Hence, 

when the number of series is large and the series themselves sufficiently 
long, we may expect with great probability that the quotient 

A 

B 


will not differ much from V^T^. 



212 INTRODUCTION TO MATHEMATICAL PROBABILITY [Chap. XI 


Divergence Coefficient 

3. The considerations of the preceding section can be generalized. 
Let us consider again n series containing s trials each, and let 

mi, m2, . . . mn 

represent the numbers of successes in each of these series. Without 
specifying the nature of the trials (which can be independent or depend¬ 
ent) we shall denote by p the mean probability in all N = ns trials and 
by g = 1 — p its complement. Again considering the quotient 

n 

- sp)* 

we seek its mathematical expectation 

EiQ) = D. 

When all the N trials are of the Bernoullian type, D = 1. But it is also 
possible to imagine cases when D > 1 or D < 1. Lexis calls \/D the 
‘‘coefficient of dispersion.^' We shall call D itself the “theoretical 
divergence coefficient." If mi, m 2 , , . , nin are actually observed fre¬ 
quencies in n series, the quotient 


D' = 


- sp)* 
Npq 


may be called “empirical divergence coefficient." Then, if the law of 
large numbers can be applied to variables 


_(»»< - sp)’ 

— 

m 


i = 1,2,3, 


we can expect with probability, approaching certainty as near as we please, 
that the inequality 

\D' - D| < € 

will be fulfilled for an adequately large number of series. 

Thus far we have not specified the nature of the trials. Now we shall 
suppose that all N = n« trials, distributed in n series, are independent 
but with probabilities varying in general from trial to trial. Let 


Pk, Pk, . . . P.t (i = 1, 2, . . . n) 
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be the probabilities in successive trials of the tth series. Their mean 

_ Pw + p** + • * * + p»i 
Pi =-- 


is the mean probability in the ith series. Finally 

Pi + Pa + * • * + Pn 

p = -- 

^ n 

is the mean probability in all N = ns trials. As to the expectation of 
{mi — sp)*, we find 

E{mi — spy = E{mi - spi + s(p< — p)y = E{mi - spiY + «Hp< ““ pY 
since 

E{mi — sp<) = 0. 

.On the other hand, 


E{mi - spi)^ = %Pii - = spt - 

y-i j-i 


♦ -1 


and 


whence 


X(P< •“ Pay = “SPl + ]^Py?, 


y-i 


;-l 


E{mi - spiY = spi - 8p? - (p< - puY- 


y-i 


Now, letting i take values 1, 2, ... n and taking the sum of the 
results, we get 


^ E(nii - spi)* = nap - s'^pI - '^(Pi - pa)*. 




»-i »-iy-i 


But 


"" -nsp* + s^pj 

whence finally 

n n « 

® ^- p-)’ - 


»-i *-i y-i 

Two particular cases deserve special attention. 
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Lexis’ Case. Probabilities remain the same within each series, 
but vary from series to series. In this case pa » and the expression of 
D becomes: 


D 


1 + 


s — 1 
npq 


2(p - Vi)*- 


The theoretical divergence coefficient in this case is always greater than 
1 and may be arbitrarily large. 

Poisson’s Case. The probabilities of the corresponding trials in all 
series are the same, so that 


and 


Vii = 


V 


= Vi * 


Ti + yi + * • ‘ + 
8 


In this case the divergence coefficient 


%ip - »•<)* 

D = 1 - - 

spq 

is always less than 1. 

Since the law of large numbers evidently is applicable to variables 

«p)* 

Xi “ - f 

m 


we may expect that the empirical divergence coefficient D' will not 
differ much from D if the number of series is sufficiently large. 

For numerical illustration let us consider 100 series each containing 
100 trials, such that in 50 series the probability is % and in the remaining 
50 series it is %, Here we evidently have Lexis’ case. The mean 
probability in all trials is 

and 

100 

X (I - Pi)* - «0 • xir + 50 - 1. 

FinaUy, 

D « 1 + If - 4.96. 

Now, suppose that we combine in pairs series of 100 trials with 
probability ^ and series of 100 triate with probability to form 50 



SBC. 41 APPLICATIONS OF THE LAW OF LARGE NUMBERS 


215 


series each of 200 trials. Evidently we have here Poisson^s case. The 
mean probability in each series again is p = and 

200 

= 100 • + 100 • ^hs = 2. 

» — 1 

Finally, 

^ = 1 - lAr = 0.96. 

The consideration of the divergence coefficient may be useful in 
testing the assumed independence of trials and values of probabilities 
attached to these trials. In the simplest case of Bernoullian trials with 
a constant and known probability, the theoretical divergence coefficient 
is 1. Now, if the number of series is sufficiently large and the empirical 
divergence coefficient turns out to be considerably different from 1, 
we must admit with great probability that the trials we deal with are not 
•of the supposed type. If, however, the empirical divergence coefficient 
turns out to be near 1, that does not conclusively prove the hypothesis 
concerning the independence of trials and the assumed value of the 
probability. It only makes this hypothesis plausible. 

There are cases of dependent trials (complex chains considered by 
Markoff) in which the theoretical divergence coefficient is exactly 1 and 
the probability of an event has the same constant value in each trial, 
insofar as the results of other trials remain unknown. Cases like that 
may easily be mistaken for Bernoullian trials without further detailed 
study of the entire course of trials. 

^ 4. When there is good reason to believe that the trials are independent 
with a constant but unknown probability, we cannot in all rigor find the 
value of the empirical divergence coefficient 


D' 


(ftii - spy 


Npq 


to compare it with the theoretical divergence coefficient D = 1, since p 
remains unknown. 

But, relying on Bernoulli's theorem, we can take the quotient 


where 


M 

N 


M — mi + mt+ • • • + mn 


as an approximate value of p. By taking p = M/N in the preceding 
expression for we get another number 
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T)ff - _ 

^ M(N - M) 

which in general is close to D\ However, considering mi, m*, . . . rtin 
not as observed but as eventual numbers of successes in n series, the 
mathematical expectation of D" is different from 1. To avoid this 
difficulty, it is better to consider a slightly different quotient 

n 

n(N - 1)2 “ * f ) 

® “ (n - l)M{N - M) 

For this quotient there exists a theorem discovered and proved for the 
first time by the eminent Russian statistician Tschuprow. 

Theorem. The mathematical expectation of Q is rigorously equal to 1.^ 
Proof. Here we shall develop the proof given by Markoff. The 
above given expression of Q presents itself in the form % and therefore 
has no meaning in two cases: M = 0 or M = jY. For these exceptional 
cases we set Q = 1 by definition. If neither M =* 0 nor M = we 
can present Q in the form 


2 n,» - 


(4) 


0 = 


n(N - 1) 


n - 1 M{N - M) 


Considering mi, mt, . . . mn as stochastic variables assuming integral 
values from 0 to a, the probability of a definite system of values 


is 


P = 


al 


mi, m,, 
a! 


m« 


a! 


mi!(a — mi)! mj!(a — m*)! 


m»!(a - mn)!' 




To get the expectation of Q we must multiply it by P and take the 


sum 


E{Q) = XPQ 


extended over all non-negative integers mi, ms, . . . m^, each of them 
not exceeding a. To perform this multiple summation we first collect 
all terms with a given sum 


mi -h ms + • • • + m, * Af. 

^ The theorem itaelf and its proof given by Markoff can be extended to the case of 
leries of unequal length. 
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Let the result of this summation be Sm^ Then it remains to take the 
sum 

N 

XSm 

Af-0 

to have the desired expression E(Q). To this end we first separate two 
terms corresponding to Af = 0 and M = N. In the former case 

nil ^ m2 — * * * = mn = 0 

and the probability of such an event is while 0 = 1. In the latter 
case 

mi = = • • • = rrin = 8 

the probability of which is while again 0 = 1- Thus 

N~\ 

E(Q) = M- 2^ Sm. 

AT-l 

To find Sm we observe that the denominator of 0 has a constant value 
when summation is performed over variable integers mi, mi, . . . mn 
connected by the relation 

mi + mi + • * • -|- mn = Af. 

Hence, it suflices to find two sums 

2P and SPm? 

extended over integers mi, mi, . . . m„ varying within limits 0 and s 
and having the sum Af. To this end consider the function 

V = + gYipte^* + 5^)' * * ' (p^e^* + q)* 

involving n + 1 arbitrary variables t, fi, £i, . . . £«. When developed, 
V consists of terms of the form 

Evidently we obtain the sum 2P by setting £i = fi = * • * = f» = 0 
and taking the coefficient of in the expansion 

= (?< + ?)". 

Thus 

w S'" - 

To find 2Pm? take the second derivative 

d*7 
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and after setting {i = {2 = • • * = fn - 0, expand 


/a»y\ 


and take the coefficient of Thus we find 

(6) 2^”** “ [®(Af-^) \(N-M) “ ^\m-2) \{N-M) 

Referring to (4), (5), and (6), we easily get 

„ _ n{N - 1) (N - 2)\N , „ _ , 

“ (n - \)M{N - M) ■ n(M - l)!(iV - 14)^" ” 

+ (AT - n)(Af - 1) - MiN - Dip"?"-*; 

or, after obvious simplifications, 


Hence 


___ ^MnN-M 

M\{N - M)r ^ ‘ 


X -S- = (P + ?)" - P" - = 1 - P" - 9*', 


and finally 


E{Q) = 1. 


Markoff, using the same method, succeeded in finding the explicit 
expression of the expectation 

E{Q - 1)2. 

Since there is no difficulty in finding this expression except for some¬ 
what tedious calculations, we give it here without entering into details 
of the proof: 

P(fi 1 ^. _ 2mN-n) ^ V-l .iV-Af-1 „ 

~ in- l)iN - 2)iN - 3)^ M N - M ^ ’ 


whence the following inequality immediately follows: 

E(0 - 11* < 2jy(Ar - n) 

^ in - DiN - 2)iN - 3) 

In case n ^ 5 a still simpler inequality holds: 


EiQ - 1)‘ < 
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Let R be the probability of the inequality 

e ^ 1 + 

where € is a positive number. Applying the same reasoning to inequality 
(7) as was used in establishing Tshebysheff^s lemma, we find that 


® 

Likewise, denoting by R* the probability of the inequality 

Q ^ 1 - 


we have 


R' < 


2 

(n — l)c* 


Thus, in a large number of series it becomes very unlikely that itie 
value of Q found in actual experiment would lie outside of the interval 
1 — 1 + 6. For instance, the probability for Q ^ 2 in 100 series is 

surely less than 

99 


or nearly 0.02. However, this limit is much too high. It would be 
greatly desirable to have a good approximate expression for the proba¬ 
bility of either one of the inequalities 

Q ^ 1 + € or Q ^ 1 - €. 

• 

But this important and difficult problem has not yet been solved. 

5. In order to illustrate the foregoing theoretical considerations we 
turn to experiments reported by Charlier in his book Vorlesungen 
liber die Grundztige der mathematischen Statistik” (Lund, 1920). He 
made 10,000 drawings of single cards from a complete deck of 52 cards 
(each card taken being returned before the next drawing), and noted 
the frequency of black cards. The drawings were divided into 1,000 
series of 10 cards, or into 200 series of 50 cards. The results are given 
in the tables on page 220. 

Assuming the independence of trials and the constant probability 
p = 3^, the theoretical divergence coefficient must be 1. Let us compare 
it with the empirical divergence coefficient derived from Tables I and II. 
To this end we multiply the squares of numbers in the second column 
by the numbers given in the third column. The results are: 


For 200 series of 50 cards 
2(m, — pa)* = 2,487 


For 1.000 series of 10 cards 
2(mv — pa)* *= 2.419 
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Table I.— Nuicbbr of Black Cards in 
200 Groups of 50 Cards Each 


Frequency 

Difference 
m — 25 

Number of 
groups with 
these 

frequencies 

■■ 

-11 

■mn 


-10 



- 9 


17 

- 8 


18 

- 7 


19 

- 6 

8 

20 

- 6 

6 

21 

- 4 

15 

22 

- 3 

13 

23 

- 2 

15 

24 

- 1 

34 

25 

0 

14 

26 

1 

21 

27 

2 

26 

28 

3 

14 

29 

4 

10 

30 

5 

5 

31 

6 

5 

32 

i 7 

3 

33 

8 

i_ 

2 


Table II.— Nuiaber of Black Cards in 
1,000 Groups of 10 Cards Each 


Frequency 

Difference 
m — 5 

Number of 
groups with 
these 

frequencies 

0 

-5 

3 

1 

-4 

10 


-3 

43 


-2 

116 


-1 

221 

5 

0 

247 

6 

1 

202 

7 


115 

8 


34 

9 


9 

10 

5 

0 


Dividing these numbers by 10,000 • 34 = 2,500, we get the following 
empirical divergence coefficients: 

D' = 0.9948; = 0.9676. 

Both are close to 1, so that the h 3 rpotheses of independence of trials 
and constant probability for each of them, are in good agreement with 
empirical results. The second divergence coefficient, corresponding to 
more numerous groups, differs from 1 more than the first, corresponding 
to only 200 groups. But such a difference can be accounted for by 
fluctuations due to chance. 

Series of 50 trials are long enough to test the theorem established in 
Sec. 2 of this chapter. The quantities denoted there by A and B are 
here correspondingly: 

A = 3.5263 

B * IH; B = 2.805 

whence 

A 
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while 

^ = 1.2533. 

Again the difference, only about 4.10“*, is rather small. 

In this example, the probability of drawing a black card was assumed 
to be 3^. In case we do not know the probability, but suppose it to be 
constant throughout 10,000 independent trials, we must consider the 
coefficient 

In our example 

n = 1,000; N = 10,000; M = 4,933 
8 = 10; 8^ = 4.933. 

To evaluate the sum 

1.000 

5 = 5^ (»»< - 4.933)» 

• -1 

we write it in the form 


Now 


1,000 1,000 

s = % X - 5) + 1-000 • (0.067)*. 

•-1 t -1 


1,000 


5) (»n< - 5)> = 2,419 
1 

1,000 • (0.067)* = 4.489 


1,000 

0.134 (m.-- 5) = -8.978 

1 


S = 2,414.51 

This is to be multiplied by the number 

n{N - 1) _ 1 

(n - \)M{N - M) 2497.3* 

The result is 

0.9668, 

near enough to 1 for us to consider the hypothesis of independence of 
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trials and the constant value of probability as in agreement with experi¬ 
mental data. 


Examples op Dependent Trials 

6. So far we have dealt only with independent variables. But the 
law of large numbers holds, under certain conditions, even in the case of 
dependent variables. Leaving aside generalities, we shall show the appli¬ 
cation of the law of large numbers to a few interesting problems involving 
dependent variables. 

Let us consider first a Bernoullian series consisting of n + 1 inde¬ 
pendent trials with the same probability p for an event E, the opposite 
event being denoted by F. We associate with trials 1, 2, ... n variables 
Xi, X 2 , . . . Xn defined as follows: 

Xi = 1 if E occurs in trials i and i + 1, 

Xi = 0 in all other cases. 

The probability of x< = 1 evidently is p* when nothing is known about 
the values of other variables. But if we know that x<_i = 1, which 
implies the occurrence of E in the zth trial, then the probability of x,* = 1 
is p. Thus, consecutive variables are dependent. However, x,* and x* 
are independent if |A; — i| > 1, as we can easily see. Since 

E{xi) = E{x\) = p2 • 1 + (1 - p2) • 0 = p* 

the expectation of the sum Xi + a ;2 + • * * Xn will be 

E{xi + X 2 + • * • + Xn) = np*. 

As to the dispersion of this sum, it can be expressed as follows: 

n 

Bn = - P*)- 

* -1 i >* 

Now 

(8) E{xi - p2)2 = E(x*) - 2p^E{Xi) + p^ = p2(l - p2) 
and 

(9) E{xi - p*)(x/ - p2) = E{xi - p2) • E{xj - p2) = 0 
for j > i + 1 because then x< and xy are independent. But 

(10) E{xi — p*)(x<+i - p*) = E{xiXi+i) - p* = p» - p* 

since the probability of simultaneous events 

Xi = 1 , Xi+i = 1 
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is p*. Taking into account (8), (9), and (10), we find 

Bf, = np^qiSp -f 1) — 2p*g 

and the condition 

Bn f. 

0 as n —> 00 
n* 

is satisfied. Hence, the law of large numbers holds for variables Xi, 
Xi, . . * Xn. To express it in the simplest form, it suffices to notice that 
the sum 

a;i + a;a + • • • -f Xn 

represents the number of pairs EE occurring in consecutive trials of the 
Bernoullian series of n + 1 trials. Let us denote the frequency of such 
pairs by m. Then, referring to the law of large numbers, we get the 
following proposition: 

• If in n consecutive pairs of Bernoullian trials the frequency of double 
successes EE is m, then the probability of the inequality 


will approach 1 as near as we please^ when n becomes sufficiently large. 

7. Simple chains of trials, described in Chap. V, Sec. 1, offer a good 
example of dependent trials to which the law of large numbers can be 
applied. Let pi be the given probability of an event E in the first trial. 
According to the definition of a simple chain, the probability of E in 
any subsequent trial is a or according as E occurred or failed to occur 
in tfie preceding trial. By p„ we denote the probability for E to occur 
in the nth trial when the results of other trials are unknown. Let 


8 == a — 0, 


V = 




1 - 8 


Then, according to the developments in Chap. V, Sec. 2, 


whence 


Pn = p + (pi - p)5"~S 


Pi + P2 + • * • + P« _ , Pi - p 1 “ 6* 

barring the trivial cases 5 = 1 or 5 = —1. It follows that p represents 
the limit of the mean probability in n trials when n increases indefinitely, 
and for that reason p may be called the mean probability in an infinite 
chain of trials. When it is known that E has occurred in the ith trial, its 



224 INTRODUCTION TO MATHEMATICAL PROBABILITY [Chap. XI 


probability of occurring in some subsequent jth trial is given by 
pf = p + g = 1 - p. 

In the usual way we associate with trials 1, 2, 3, . . . variables 
Xif Xi, xs, . . . so that in general 


Xi = 1 when E occurs in the tth trial 

x< = 0 when E fails to occur in the tth trial. 


Evidently 

E{xi) = E(af) = Pi. 

In order to prove that the law of large numbers can be applied to 
variables Xi, x^, Xs, . . . , we must have an idea of the behavior of Bn 
for large n. By definition 


B,. = E(xi Pi + xj - Pj + * • * 4- - pO* = ^E{Xi - p<)* 4- 

+ 2%E[{xi - pi){Xi - Pi)]. 
i>i 

The first sum can easily be found. We have 
E{xi - p,)* = Pi - p? = pg 4- (g - p)(pi - p)6*“‘ - (pi - 
whence 

n 

A = ^E{Xi - piY'^npq 

neglecting terms which remain bounded. As to the second sum, we 
observe first that 


E{xi - pi){xi - Pi) = .&(x,x,) - p,p,-. 
Again, since the probability of 

XiXi = 1 

is evidently p.y/’ we have 

£;(x<x,) = p<p}‘\ 
and 


E{Xi - pO(x/ - pi) = pi:pf - Pi) = pg5»-< 4” 

+ (Px “ P)(q - P)«'~‘ - (Pi - p)*«‘'^'“*. 
Now, for a fixed t = 1, 2, ... n — 1, we must take the sum of these 
expressions letting j run over i 4- 1, » 4- 2, . . . n. The result of this 
summation is 


a - 

wnrri- 


+ (P» - P)(? - 


(pi - p)*«* 
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Taking t = 1, 2, 3, . . . n — 1 and neglecting in the sum the terms 
which remain bounded, we get 


whence 


B = X^(xi - p<)(xf - Pi) ~ npg—*_ 

i>i ^ ~ * 


B. = A + 2B ~ 

This asymptotic equality sufiBices to show that 
—J ► 0 as n 00 . 


Therefore the law of large numbers can be applied to variables Xi, 
xj, xs, . . . . Since the sum 

aJi -f- X2 -+* * • / -f Xn = m 

represents the frequency of £? in n trials, the law of large numbers in 
this particular case can be stated as follows: For a fixed 6 > 0, no matter 
how small, the probability of the inequality 


m 

n 


Pi + Pi + • • 

• -l-Pn 

n 



tends to 1 as n —> 00 . 
The arithmetic mean 


Pi 4- P2 -f — ■ + Pn 
n 


itself approaches the limit p. It is easy then to express the preceding 
theorem thus: The probability of the inequality 


m 

n 


P 


< € 


tends to 1 05 n —► 00 . 

This proposition is of exactly the same type as Bernoulli's theorem, 
but applies to series of dependent trials. 

8. Let a simple chain of N = ns trials be divided into n consecutive 
series each consisting of s trials; also, let mi, mj, . . . mn be the fre¬ 
quencies of E in each of these series. When W is a large number, the 
mean probability in N trials differs little from the quantity denoted by p. 
It is natural to modify the definition of the divergence coefficient given 
in Sec. 3 by taking p instead of the variable mean probability in N trialsi 
Thus we define 
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»P)* 


Npg 

In our case, the variables 

Xi — (mi — «p)’, Xi = (m> — sp)’, - Xn = (m» — spy 

are neither identical nor independent, although the degree of dependence 
b evidently very slight. These variables can also be presented in the 
form 

(11) (*« - p + *0+1 - P + • • • + *.+.-1 - p)* 

taking successively a = 1, s + 1, 2s + 1, • • • (n — 1)8 + 1. 

To find the mathematical expectation of (11) it suffices to notice that 

E{xi - p)’ = E(xi - p.)’ + (pi - py = pg + (g - p)(pi - p)i*~‘ 

E(,Xi - p)(xi - p) = Eixi - Pi)(xi - Pi) + (pi - p){pi - p) 

= pgP-^ + (p, - p){g - p)«»-‘ 

and then proceed exactly as in the approximate evaluation of Bn in Sec. 7. 
The final result is 

E{Xc — P + *0+1 — P + • • • + *o+»-l — p)* = 


= spg- 


1 + i 2pg6 , (g 


1 - a 

2pg 

(1 - «)• 


p)(pi - p)(l + S) 

(1 - sy ^ a - sy 

(? - p)(pi - p)r, 


a*+> - 


(1 - «)» 


428(1 - 8 ) + 1 + 8 la«+-*. 


For somewhat large s the two last terms in the right member are com¬ 
pletely negligible; so is the third term if a ^ 5 -f- 1. Hence, with a good 
approximation, 


E(XO = 

EiXi) = «P9^ 


2pgS 

(1 - sy 

2pgS 


+ 


(1 ~ 


(g - 


if 


p)(pl ~ p)(l + 

(1 ~ 5)* 

i > 1 


and 


r. _ 1 + « 25 (g - pXpi - p)(l + 5) 

^ " 1 - 5 a(l - 5)» Npq{l - 5)* 

Again, when N is large, the last term can be dropped and as a good 
approximation to D we can take 


( 12 ) 



25 

s(l ~ 5 )*' 


It can be shown that the law of large numbers holds for variables Xi, 
Xi, , . . Xn and therefore when n (or the number of series) is large, the 
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empirical divergence coefficient is not likely to differ considerably from 
D as given by the above approximate formula. 

9. In order to see how far the theory of simple chains agrees with 
actual experiments, tbe author of this book himself has done extensive 
experimental work. To form a chain of trials, one can take two sets of 
cards containing red and black cards in different proportions, and 
proceed to draw one card at a time (returning it to the pack in which it 
belongs after each drawing) according to the following rules: At the 
outset one card is taken from a pack which we shall call the first set; 
then, whenever a red card is drawn, the next card is taken from the first 
set; but after a black card, the next one is taken from the second set. 
Evidently, these rules completely determine a series of trials possessing 
properties of a simple chain. In the first experiment the first pack 
contained 10 red and 10 black cards, while the second pack contained 5 
red and 15 black cards. Altogether, 10,000 drawings were made, and 
following their natural order, they were divided into 400 series of 25 
drawings each. The results are given in Table III. 

Tablb III.— Distribution of Red Cards in 400 Series of 25 Cards 


Frequency of 
red cards, m 

Difference, 
m — 8 

Number of series 
with these frequencies 

1 

-7 

2 

2 

-6 

4 

3 

-5 

8 

4 

-4 

27 

5 

-3 

29 

6 

-2 

54 

7 

-1 

37 

8 

0 

52 

9 

1 

47 

10 

2 

44 

11 

3 

41 

12 

4 

20 

13 

5 

20 

14 

6 

7 

16 

7 

4 

16 

8 

3 

17 

9 

1 


The sum of the numbers in column 3 is 400, as it should be. Taking 
the sum of the products of numbers in columns 1 and 3, we get 3,323, which 
is the total number of red cards. The relative frequency of red cards in 
10,000 trials is, therefore. 


0.3323. 
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In our case 

/3 = 1. « = i 

and the mean probability p in an infinite series of trials 
p - r4-j - 1 - 0.3333. 

Thus, the relative frequency observed differs from p only by 10“* and 
in this respect the agreement between theory and experiment is very 
satisfactory. Now let us consider the theoretical divergence coefficient 
for which we have the approximate expression 

r, _ 1 + 5 26 

1-5 s(l - by 

Here we must substitute b = and s = 25. The result is 
D = 1.631, approximately. 

To find the empirical divergence coefficient we must first evaluate the 
sum 

iS = s(m - 

extended over all 400 series. For the sake of easier calculation, we 
present S thus: 

5 = 2(m - 8)* - §S(m - 8) + 

Now from Table III we get 

2(m - 8)* = 3,521; 2(m - 8) = 123 

whence 

S = 3,483.4. 

Dividing this number by 2000 j^ = 2,222.2, we find the empirical 
divergence coefficient 

D' = 1.568 

which differs from D = 1.631 by only about 0.06, well within reasonable 
limits. 

10. In two other experiments two packs were used: one containing 
13 red and 7 black cards, and another 7 red and 13 black cards. In 
one experiment the pack with 13 red cards was considered as the first 
deck, and in the other experiment it became the second deck. The 
new experiments were conducted in the same way as that described in 
Sec. 9, but they were both carried to 20,000 trials divided into 1,000 
series of 20 trials each. In the first experiment, we have 


and 


D = 1.796, approximately, 
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while the same quantities for the second experiment are 

« ~ iftri ^ ^ = “Wi P = i 

and 

D = 0.556, approximately. 

The results of these experiments are recorded in the following two 
tables: 


Table IV.— Concerning the First Experiment 


Frequency of 
red cards, m 

Difference, 
m - 10 

Number of series 
with these frequencies 

2 

-8 

3 

3 

-7 

5 

4 

-6 

18 

5 

-5 

36 

6 

-4 

59 

7 

-3 

93 

8 

-2 

103 

9 

-1 

117 

10 

0 

128 

11 

1 

121 

12 

2 

101 

13 

3 

93 

14 

4 

48 

15 

5 

39 

16 

6 

26 

17 

7 

7 

18 

8 

1 

19 

9 

1 

20 

10 

1 


Table V.— Concerning the Second Experiment 


Frequency of 
red cards, m 

Difference, 
m - 10 

Number of series 
with these frequencies 

5 

-5 

2 

6 

-4 

10 

7 

-3 

48 

8 

-2 

112 

9 

-1 

193 

10 

0 

251 

11 

1 

201 

12 

2 

113 

13 

3 

56 

14 

4 

0 

15 

5 

5 





230 INTRODUCTION TO MATHEMATICAL PROBABILITY [Chap. XI 


Taking the sum of the products of numbers in columns 1 and 3, we 
find 

10,036 and 10,045 

as the total number of red cards in the first and second experiments. 
Dividing these numbers by 20,000, we have the following relative 
frequencies of red cards: 

0.50018 and 0.500225 

extremely near to p == 0.5. From the first table we find that 
2(m - lOy = 8,924 


summation being extended over ail 1,000 series. Dividing this number 
by 20,000 • = 5,000, we find the empirical divergence coeflUcient in 

the first experiment 

D' = 1.785 

which comes close to 


D = 1.796. 


Likewise, from the second table we find 

S(m - 10)* = 2,709, 


whence, dividing by 5,000, 


again close to 


D" = 0.5418 


D = 0.5562. 


Thus, all the essential circumstances foreseen theoretically, for simple 
chains of trials, are in excellent agreement with our experiments. 


Problems for Solution 

1. From an um originally containing a white and b black balls, n balls are drawn 
in succession, each ball drawn being replaced by 1 -h c(c > 0) balls of the same color 
before the next drawing. If m is the frequency of white balls, show that the prob¬ 
ability of the inequality 

m a 

-TT <• 

n o -b b 


does not tend to 1 as n increases indefinitely (Markoff, G. P6lya). 

Indication of the Proof. If x* = 1 or x» = 0, according as a white or a black ball 
appears in the tth drawing, we have 


a 

o -b 5* 


a g -b c 
g-bb g-bb-bc 


E{Xi) « E{xX) 


E(XiXi) 
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Hence 


Bn - ^^*1 


+ + 


+ *11 “ 


na V 

o -f 6/ (a 


nhibc 


+ b)*{a + b+e) 

+ 


nah 


(a + 5)(o 4- 6 + c) 


2. Marbe's Problem. A group of exactly m uninterrupted successes E or failures F 
in a Bernoullian series of trials with the probability p for a success is called an “m 
sequence." If is the frequency of m sequences in n trials, show that the probability 
of the inequality 


n 


(p"?* + p*?"*) 


< € 


for a fixed c converges to 1 as n becomes infinite. 

Indicaiion of the Proof. Associate with each of the m “ n -• m -I* 1 first trials 
variables xi, of*, . . . assuming only two values, 0 and 1. For 1 <* » < p we set 
Xi - I if, beginning with the ith trial, a succession of m letters E or F is preceded and 
followed by F or E. In all other cases Xi — 0. We set = 1 if, beginning with the 
first trial, there is a succession of m letters E or F ended by F or E, otherwise Xi =* 0. 
Finally, = 1 if, beginning with the /ith trial there is a succession of m letters E or F 
preceded by F or E, otherwise x^ = 0. Show that 


E{x\. 4- *2 H- • • • 4- Xm) (w ^ — l)(p'"9* 4- p*?"*) 4- 2(p"*g 4- p?"*) 
E{xi 4" X 2 4" * • • 4" *m)* “ n*{p^q* 4" p*?"*)* -b nP 


where P remains bounded. 

3. The following interesting series of dependent trials has been suggested by S. 
Bernstein: Two urns contain white and black balls. The probabilities of drawing 
white balls from the first and second urns are, respectively, p and p'. The probabilities 
of drawing black balls from the same urns are q = 1 — p and q' = 1 — p'. Finally, 
the probability of taking a ball from the first urn at the outset of the trials is a. A 
series of trials is uniquely defined by the following rule: Whenever a white ball is 
drawn (and returned), the next ball is drawn from the same um; but when a black 
ball is drawn, the next ball is taken from the other urn. Let a* be the probability 
that the nth ball will be drawn from the first um when the results of other drawings 
remain unknown. Under the same assumption, let p» be the probability of the nth 
ball being white. Find general expressions of and p». 

Hint; 

o«+i = onip 4- p' — 1) 4- 1 — p' 

whence 


Also 

whence 


Pn 


On 



4- 


(• 



+pf -1 )-«. 


p, = a,p + (1 — o,)p' 


p + p' - 2pp' 
2 - p - p' 


(« 



p')(p' 4- p ~ I)"”*. 


4. When it becomes known that in the ith trial a white ball was drawn, what are 
the probabilities aj'^ and p^p of taking a ball from the first um in the ith (j > i) trial 
and of drawing a white ball in the same trial? 
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Hint: The probability aV’ that it wae the first urn from which a white ball was 
drawn in the tth trial is determined by Bayes’ formula: 


For n ^ » 4-1 


whence 




“5+1 




«iP 

Pi 


+ p' - 1) 4- 1 - p' 

_i^ + _ -L^ Yp + P- - D/w-i 

- p - v' \Vi 2 - p - p7 


for j > i 4“ 1. Furthermore 

= a5*V 4- (1 — 

for j ^ t 4- 1. 

6. From now on we shall assume p 4- p' = 1 or p' « g, « p. Show that the 
law of large numbers can be applied to variables xi, which are defined in 

the usual way: 

Xi = 1 if a white ball is drawn in the tth trial, 

Xi = 0 if a black ball is drawn in the tth trial. 


Indication of the Proof. Evidently E(Xi) = E{z*) = p». Furthermore 


Now 


B. = '^E(xi - p,)» + 2'^E{Xi - p.)(*y - Pi). 
i-l j>i 


E{Xi -p<)* =2pg(l -2pg); i> 1 
E{,xx - pO* = pg 4* a(l — o)(p — g)*. 

For y > t > 1 

E{Xi - Pi)(x/ - py) = 0 if y > t 4- 1 
E{Xi - pi)(xi+i - pi+i) = pg(l - 4pg). 

For t s* 1 and j > 1 

E(xi - pi)(xi - Pi) =0 if y > 2 
E(xi - pi)(X2 - Pi) = ap* 4- (1 - a)g* -- (1 - 2pg)(g 4" (p - g)a). 

Hence 


Bn 4pg(l — 3pg)n 

and the law of large numbers holds. It can be stated as follows: If in n trials the 
frequency of white balls is m, then the probability of the inequality 


m 

n 


(P* 4- g») 


< € 


tends to 1 as n tends to infinity for any given positive number e. 

6. Let r - p* 4* g* be the mean probability in infinitely many trials, 
divergence coefficient 


D 


(mi - »r)* 
® Nr(l - r) 


Find the 


when N » ns trials are divided in n consecutive groups containing s trials each. 
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Indication of Solution. From the foregoing formulas it follows that 
E{x^ - r -f *.+i - r + • • • + - r)* * ^spq{l - Zj>q) - 2pq{\ - Apq) 

if a > 1. Hence 


E'^ (mi - «•)* » 4Npg(l - 3pq) - 4apq(l - 3pq) - 2(n - l)p9(l - 4pq). 

*-2 

Again 

E(mi - «r)* * 48pq(l - 3p9) - 2pq (3 - lOp^) + p(l - 6g + 12g* - 4q*) - 

- a(p - g)(l - 3pq) 

so that finally 

n =, 2 “ Qpg _ 1 - , (P - g)(P - tt)(l - Spg) 

1 — 2p9 a(l — 2pq) 2Npq(\ — 2pq) 

For large N with a good approximation 

p ^ ^ ~ _1 - 4p<7 

1 - 2pq s(\ - 2pq) 

7. Two sets of cards containing respectively 12 red and 4 black cards (the first 
deck) and 4 red and 12 black cards (the second deck) were used in the following experi¬ 
ment: The first card was taken from the first deck, and in the following trials, after 
a red card the next one was taken from the same deck, but after a black one the next 
card was taken from the other deck. Altogether 25,(XX) cards were drawn, and in their 
natural order were divided in 1,000 series of 25 cards each. The results are recorded 
in Table VI. How close is the agreement between this experiment and the theory? 


Table VI.— Distribution of Red Cards in 1,000 Series of 26 Cards 


Frequency of 
red cards, m 

Difference, 
w — 16 

Number of series 
with these frequencies 

6 

-10 

1 

7 

- 9 

1 

8 

- 8 

1 

9 

- 7 

12 

10 

- 6 

13 

11 

- 5 

43 

12 

- 4 

65 

13 

- 3 

92 

14 

- 2 

101 

15 

- 1 

162 

16 

0 

94 

17 

1 

164 

18 

2 

68 

19 

3 

110 

20 

4 

26 

21 

5 

28 

22 

6 

10 

23 

7 

7 

24 

8 

1 

25 

9 

1 
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Ant. In the present case p « g' « Ki p' “ 9 = Mean probability in infinitely 
many trials: 

4 . « I « 0.625. 

Theoretical divergence coefficient: D * 1.384. Frequency of red cards: 16,696. 
Relative frequency: 

nm - 0.62784, 

close to 0.625. 

Empirical divergence coefficient: D* ■■ 1.3845, very close to 1.384. 

The probability of taking a card from the second deck is 0.25. Now, by actual 
counting, it was found that in 7,500 trials a card was taken from the second deck 
1,856 times. Hence, the relative frequency of this event in 7,500 trials is 

m « 0.2475, 

again very close to 0.25. 
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CHAPTER XII 


PROBABILITIES IN CONTINUUM 

1. In the preceding parts of this book, whenever we dealt with 
stochastic variables, it was understood that their range of variation was 
represented by a finite set of numbers. Although, for the sake of better 
understanding of the subject, it was natural to begin with this simplest 
case, there are many reasons why it is necessary to introduce into the 
calculus of probability stochastic variables with infinitely many values. 
Such variables present themselves naturally in many cases of the type of 
Buffon’s needle problem which we had occasion to mention in Chap. VI. 

On the other hand, even in dealing with stochastic variables with a 
finite, but very large number of values, it is often profitable for the sake 
of approximate evaluations, to substitute for them fictitious variables 
with infinitely many values. Among these the most important ones by 
far are continuous variables. 

Case of One Variable 

2. Beginning with the case of a single continuous variable x, we must 
assume that its range of variation is known and represented by a given 
interval (a, 6), finite or infinite. The knowledge only of the range of 
variation of x would not enable us to consider x as a stochastic variable; 
to be able to do so, we must introduce in some form or other the considera¬ 
tions of probability. For a continuous variable it is as unnatural to 
speak of the probability of any selected single value, as it is to speak of 
the dimension of a single selected point on a line. But just as we speak 
of the length of a segment of a line, we may introduce the notion of the 
probability that x will be confined to a given interval (c, d), part of (a, 6). 

In introducing this new notion of probability in any manner whatso¬ 
ever, we must be careful not to fall into contradiction with the laws of 
probability which are assumed as fundamental. To this end, if P (c, d) 
is the probability for x to lie in the interval (c, d), we are led to assume 

r P{c, d) ^ 0 
2° P(a, h) = 1. 

The first assumption is an expression of the fact that probability 
can never be negative. The second assumption corresponds to the fact 
that X certainly assumes one out of the totality of its possible values. 

235 
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Next, if the interval (c, d) is divided into two adjoining intervals 
(c, e) and (e, d), we assume 

3** P(c, d) = F(c, e) + P(e, d) 

in conformity with the theorem of total probability. 

For continuous variables it is furthermore assumed: 4® for an infini¬ 
tesimal interval (c, d), P(c, d) is also infinitesimal. 

Properties 3® and 4® show that P(c, d) is a continuous function of c 
and d and that 

P(c, c) = 0. 

In other words, the probability that x will assume any given value is 0. 
At the same time P(c, d) represents the probability of any one of the four 
inequalities 

c < X < d; V ^ X < d; c < x ^ d; c ^ x ^ d. 

3. A simple example will serve to clarify these general considerations. 
A small ball of negligible dimensions is made to move on the rim of a 
circular disk. It is set in motion by a vehement impulse and after many 
complete revolutions, retarded by friction and the resistance of the air, 
comes to rest. The variety and complexity of causes influencing the 
motion of the ball make it impossible to foresee the final position of the 
ball when it comes to rest and the whole phenomenon bears characteristic 
features of a play of chance. The stochastic variable associated with this 
chance phenomenon is the distance from a certain definite point on the 
rim (origin) to the final position of the ball, counted in a definite direction, 
for example, clockwise. This variable, when we consider the ball as a 
mere point, may have any value between 0 and the length of the rim. 
The question now arises, how to define the probability that the ball will 
stop in a specified portion of the rim, or else that the variable we consider 
will have a value belonging to a definite interval, part of its total range 
of variation. In trying to define this probability, we must observe the 
fundamental requirements set forth in Sec. 2. Besides that, we must of 
necessity resort to considerations which are not mathematical in their 
nature but are based partly on aprioristic and partly on experimental 
grounds. Suppose we take two equal arcs on the rim. There is nothing 
perceptible a priori that would make the ball stop in one arc rather than 
in another. Besides, actual experiments show that the ball stops in one 
arc approximately the same number of times as in another, and this 
experimental knowledge together with aprioristic considerations suggests 
the assumption that we must attribute equal probabilities to equal arcs, 
irrespective of the position of the arcs on the rim. As soon as we agree on 
this assumption or h 3 rpothesis, the problem becomes mathematical and 
can easily be solved. 
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Before proceeding to the solution, a remark on the meaning of zero 
probability in connection with continuous variables is not out of place. 
Zero probability in this case does not mean logical impossibility. We 
attribute zero probability to the event that the ball will stop precisely 
at the origin. However, that possibility is not altogether excluded 
so far as we consider the origin and the ball as mere points. The question 
lacks sense if we deal with a material ball and a material rim, no matter 
how small the former and how fine the latter. 

4. A stochastic variable is said to have uniform distribution of 
probability if probabilities attached to two equal intervals are equal. 
This means that P(c, d) depends only upon the length d — c = s of the 
interval (c, d) and accordingly can be denoted simply by P(«). Com¬ 
bining two adjoining intervals of the respective lengths s and s' into a 
single interval of length s + s', according to requirement 3®, we must 
have 

.(1) P(s^8') =P(s) 4-P(a'). 

Suppose now that the interval (a, h) of the length h — a = I, represent¬ 
ing the whole range of variation of x, is divided into n equal intervals 
of the length l/n. The repeated application of equation (1) gives 

p(i). ,p(i). 

But by requirement 2® P{1) = 1 and hence 



Again, repeated application of (1) gives 


K?) - ” 


for any integer m < n. Now let us take any interval of length s. For an 

T7l 

appropriate m it will contain the interval —1 and be contained in the 

ffi “ 1 “ 1 

interval-Z; hence, referring to requirements 1® and 3®, we shall have 


while 


n 91 


n 91 
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or 

m ^ < m + 1 

n I n 

Since P(8) and 8/I are contained in the same interval of length 1/n, 

|p(.) - i 

and this being true for an arbitrary n, no matter how large, it follows that 

p(*) = f 

Thus for a variable x with uniform distribution of probability, the 
probability of assuming a value belonging to an interval of length s is 
given by the ratio of s to the length I of the whole range of variation of x. 

6 . In the general case, when we cannot assume the uniform distribu¬ 
tion of probability throughout the whole range of variation of x, we let 
ourselves be guided by an analogy with a mass distributed continuously 
over a line. In fact, the distribution of a mass satisfies all the require¬ 
ments set forth for probability. In particular, the mass Am contained 
in an infinitesimal interval ( 2 , z 4- Az) is also infinitesimal and the mean 
density 

Am 

Az 

is generally supposed to tend, with Az converging to 0, to a limit called 
“density at the point z.“ If this density p{z) is known, the mass con¬ 
tained in any interval (c, d) is represented by an integral 

ffp(z)dz. 

Following this analogy we admit that the mean density of probability 

Pjz, z + Az) 

Az 

tends 10 a limit/(z): density of probability at the point z when the length 
of the interval Az tends to 0. Hence, again the probability corresponding 
to an interval (c, d) will be represented by the integral 

PiCf d) = ^^fiz)dz. 

This expression satisfies all the requirements of Sec. 2 if the density of 
the probability/(z) is subject to two conditions: 
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(a) 

f{z) ^ 0 for all z in (a, 6). 


ib) 

II 

1 



The second condition implies, of course, the existence of the integral itself. 
But in all cases of any importance the density is continuous, save for 
discontinuities of the simplest kind which do not cause any doubts as 
to the existence of the above integral. 

From the general expression of P(c, d) it follows that for an infini¬ 
tesimal interval ( 2 , z + dz) the probability is given by f(z)dz neglecting 
infinitesimals of a higher order. For the uniform distribution of proba¬ 
bility over an interval of length I the density is constant and = l/l. 

In other cases we cannot expect to obtain a definite expression for 
density unless the variable itself is sufficiently characterized by addi¬ 
tional conditions, either hypothetical or implied by the problem. Thus, 
for instance, in applications of probability to problems of theoretical 
physics, the physicists have succeeded in obtaining definite probability 
‘distributions by invoking physical laws of admitted universal validity 
together with some plausible hypotheses. 

6 . The interval containing all possible values of a stochastic variable 
may be finite or infinite according to the nature of that variable. How¬ 
ever, in all cases we may take the largest possible interval from — 00 to 
+ 00 ; to this end it suffices to define the density outside of the originally 
given interval as being = 0. Then the density will be defined for all 
real values of z and will satisfy the conditions: 

(а) f(z) ^ 0 for all z 

(б) =1 

Furthermore, the probability for x to be in any interval (c, d) will be 
given by 

l%)dz. 

In particular, taking c = — « and writing t instead of d. 

Fit) = Jiz)dz 

represents the probability that x will not exceed or will be less than t. 
Considered as a function of t, F{i) is never decreasing and varies between 
F(~ 00 ) =0 and F(+ «) = 1. It is called the “distribution function of 
probability.” In case x has uniform distribution of probability over an 
interval (o, h) its distribution function is evidently defined as follows: 
F{t) = 0 for t < a 

Fit) = for a ^t^b 

0 — a 

F(f) = 1 for t > h. 

Its graph is shown in Fig. 1 on page 240. 
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7. The definition of mathematical expectation can easily be extended 
to continuous variables; namely, the expectation of x or the mean value 
of X is defined by 

E(x) = 

provided this integral exists. Similarly, the mathematical expectation 
of any function <p{x) is given by 

EW(x)] = 

Of course, the existence of the integral in the right member is presupposed 
again. When this integral does not exist, it is meaningless to speak of 

the mathematical expectation of <p{x). 

____ The mathematical expectation of the 

-«> a b +00 power a:” with positive integer exponent 

is called the moment of the order n or 
nth moment. We shall denote it by nin so that 

nin = 

The dispersion D and the standard deviation of x are defined in the same 
way as in Chap. IX; namely, 

/> = 0-2 — ■“ m\Yf{z)dz — m 2 — m\. 

Often it is advisable to consider the mathematical expectation of |x|“ 
where a may be any real number, ordinarily positive. This expectation 
is called the “absolute moment of the order Its expression is 

lie, = 

and it is evident that 

2n2fc = /X2jfc; |2n2A;fl| ^ M2Jfc+l* 

The mathematical expectation of the function 

eitx 

where t is a real variable, is of the utmost importance. It is called the 
“characteristic function” of distribution and is defined by 

»>(0 = 

Since f(z) ^ 0 and 

f“j(.z)dz = 1 
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the integral defining <p{t) is always convergent and 

k(01 ^ 1. 

The distribution is completely determined by its characteristic func¬ 
tion. Because by the Fourier theorem 

^ = /(*) 
at all points of continuity of f(x). But the left-hand member is 

by the definition of <p{t) and so 

fix) = 

8 . To illustrate the preceding general explanations we shall now con¬ 
sider a few examples. 

Example 1. Let x be a variable with uniform distribution of probability over 
the interval (0, /)• The density of this distribution being constant 


the mean value of x is 


and the second moment 


Hence, the square of the standard deviation 

<r* = m2 — Tn\ = —• 

This simple example may be used to illustrate a remark made at the beginning of this 
chapter, that sometimes it is profitable to substitute for a variable with a finite but 
large number of values a fictitious continuous variable. Suppose that in flipping a coin 
n times, we mark heads by 1 and tails by 0, thus obtaining a sequence comprising n 
units and zeros altogether, disposed in the order of trials. This sequence may be con¬ 
sidered as successive digits in the binary representation of a fraction: 



X = ^ + • 

2 4 


contained between 0 and 1. X may be considered as a stochastic variable with 2" 
values each having the probability 1/2". The probability II (a, /3) that X will be con¬ 
tained in the interval (a, /3), or more definitely that A' will satisfy the inequalities 

a < X ^ ^ 
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is obviously obtained by multiplying the number of integers N contained in the limits 

2*a <N ^ 2»/3 

by l/2». Now there are exactly 

[2»/5l - [2*a] « 2»(/9 - a) + -1 < < 1 

such integers; hence 

/3) - - ce + 

If n is even moderately large, this probability is very near to the probability 

P(a, /3) « iS - a 

that a fictitious variable x with uniform distribution over the interval (0, 1) will 
assume a value in the interval (a, /3). The first two moments of the variable X are, 
respectively 


Ml 

Mi 


0 + 1+2 + 


2 *** 

0 * + 1 * + 2 * + • 


+ 2 * - 1 1 1 
“ 2 2 *+‘ 

• + ( 2 « - 1 )» 1 1 


2*" 


3 2*+‘ 3-2**+i 


and differ little from the respective moments ^ and of the fictitious continuous 
variable. Without losing anything essential, we here gain considerably in sim¬ 
plicity by substituting a fictitious continuous variable for the discontinuous variable 
X, 

Example 2. A thin bar can rotate freely about its middle point P. It is set in 
motion and after several revolutions comes to a stop pointing toward a point X on a 
line 1. The position of the bar is determined by an angle $ 
formed by itself and the perpendicular PO dropped from Paul; 6 
varies between the limits —ir/2 and r/2 and its distribution is 
supposed to be uniform. The position of X is determined by 
its distance OX — x from 0, this distance being positive or nega¬ 
tive according as X is to the right or to the left of the point O. 
It is required to find the distribution of the probability of x. The relation between 0 
and x is 


1 

r»\ 

O X 
Fio. 2. 


if OP =» a or, conversely, 


a tg e 


> arc tg — 
0 


By differentiation we find the relation between de and dx: 


d9 


adx 

o* + X* 


Now, by hypothesis, the probability that <OPX will be contained between d and 
e + dfl is 

de 1 adx 

T IT o* + X* 
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And the probability that the distance of X from 0 will be contained between x and. 
X dx IB the same. Hence, the density of probability for the variable x is 


T o* -f 

and the probability corresponding to a finite interval (c, d) is given by 


P(c, d) « - 


1 adz 
vje a* -f z* 


For the whole range of variation of x 


1 r * adz 


as it should be. However, we cannot speak of the mean value of x or of moment? oJ 
higher order, since the integrals 


I * xdx r x*dx 
I -» I -» etc. 


have no meaning. But the characteristic function ip{t) exists and is given by 


__ o r * e***dx 


Example 8. One of the most important distributions (theoretically and prac¬ 
tically) is the so-called Gaussian^* or “normal" distribution. The density of this 
distribution is given by 

f{z) = 

with three parameters K, h, a. However, only two of these parameters are inde¬ 
pendent, since we must have 


J* /(*)* - kJ = kJ e-^'''du 


and finally 


To find the meaning of a and h we observe that the mean value of our variable is 

f " - -4= r (* - o)c-‘*“-“>’dz + — f = a 

VirJ-- V'J-- 

since 

J** (« — — J* ue“****dtt — 0. 
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Thus a has the meaning of the mean value of the normally distributed variable *. 
The square of the standard deviation is given by 


whence 


v; 


/: 


_ a)Hz 




e **“*M*dM 


2h* 


h 



Thus for the normally distributed variable with the mean a and standard deviation a 
the density of probability is 


fiz) = 



2<r* 


Finally, for the variable u = x -- a with the mean value 0 and the same standard 
deviation, the expression of density takes the simplest form 


f(z) = 



j* 

2a* 


and the distribution function of probability is represented by the integral 


The curve of density 


m 



-JlL 

e ^^*dz. 


V = 



X* 

2a* 


ojr the probability curve has a bell-shaped form as shown in the figure corresponding 

to <r = 1. It has a single maximum corre¬ 
sponding to a; = 0 and on both sides of this 
maximum it rapidly approaches the x axis. 

The characteristic function of normal 
distribution has a very simple form. By 

definition 



O 

Fio. 3. 


But as 


/-■ 


e 



cos Pxdx 



jL 


(a > 0) 


we find that 


<rn» 


The moments of normal distribution (with the mean * 0) can now be easily found. 
From the definition of the characteristic function it follows that 


i^mn 
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In our case 


whence 




Thus 


mtk+i = 0 

mj* = 1 • 3 • 5 • • • (2* - 


Case of Two or More Variables 
9. By analogy it is easy now to extend the notion of probability to 
two or more variables considered simultaneously. A pair of special 
values Xy y of two stochastic variables X, Y will be represented geomet¬ 
rically by a point with the coordinates x, y referred to a rectangular 
system of axes. The domain S of all the possible values of X and Y will 
be represented by a portion (finite or infinite) of a plane with a definite 
boundary unless this domain coincides with the whole plane. The 
probability that the point x, y should belong to an infinitesimal area 
dxdy will be expressed by the product (p{xy y) dxdy where the function 
(p{Xy y) is again called the density of probability at the point Xy y. The 
density of probability must satisfy two requirements: it is non-negative 
in the whole domain S and 

y)dxdy = 1 

where the double integral is extended over all the domain S, The 
probability for the point Xy y to be located in a given domain a is then 
given by the integral 

//»’(*> y)dxdy 

9 

extended over <r. 

If ^(x, y) is a constant in Sy the distribution of probability is called 
uniform. The domain <S in this case must be finite and if its area is 
denoted by the same letter, then 

y) = 

The probability for the point x, y to be within the domain a will be given 
by the ratio 

a 

S 

denoting the area of the domain (r by (r again. 
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10. We can always substitute the whole plane for the domain S. 
To that end it suffices to set 

y) = 0 

in all points not belonging to S. We shall then have 

y) ^ 0 


everywhere and 

s:j:. y)dxdy = 1. 

By doing so we have the advantage of stating results in a perfectly general 
form without mentioning the domain S. However, in dealing with 
particular problems, it is more convenient to consider only those points 
which can actually represent simultaneous values of the variables. 
The probability of simultaneous inequalities 

a < X <h; c < y < d 

according to the general definition is represented by the double integral 

rr <p(Xf y)dxdy. 

This corresponds to the compound probability of two events and we must 
see that the fundamental theorem of compound probabilities continues 
to hold. Taking c = —oo,d = + oo the repeated integral 

M: y)dy 

represents the probability P(a, h) for the variable X (as if it were con¬ 
sidered alone without any reference to 7) to have its value in (o, 6). 
The function 


/W = y)dy 

represents the density of probability of X. Thus 
F(o, 6) = £f{x)dx. 

In a similar way 

^(y) = /.".v'Caf. y)dx 

represents the density of the probability of Y ; and the probability Q(c, d) 
that this variable has its value in (c, d) is given by 

Q(c, d) = 
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Now the double integral 

/:/: <p(x, y)dxdy 

can be written in either of the forms 


where 


y)d^dy = fy^dx • fyi{y)dy 
f"F(y)dy ■ £Mx)dx 


Fi(y) 


f^vix, y)dx 

jy^’ 


fl{x) = 


y)dy 

fy(y)dy 


may be considered as densities of conditional probabilities, respectively, 
for Y when it is known that X has a value in (a, b) and for X when it is 
khown that Y has value in (c, d). The preceding expressions for the 
probability of the simultaneous inequalities 

a < X <b, c < y < d 

have the same form as the theorem of compound probability and may be 
considered as its extension. The conditional probability for Y to have 
its value in (c, d) when it is known that X has its value in (o, b) is given by 

fy^(y)dy. 

Now, we define variables X and Y as independent when the proba¬ 
bility for Y to be in (c, d) is not affected by the knowledge that X belongs 
to (a, 6), which means that 


ffFiiy)dy = 

or 

and, since intervals (a, 6) and (c, d) are arbitrary, 

y) = /W * Fiy) 

at points of continuity. Hence, the density of probability for two 
independent variables is a product of a function of x alone by a function 
of y alone. Conversely, when this condition is satisfied the variables are 
independent. For independent variables the probability of the simul¬ 
taneous inequalities 

a < X <h 
c <y <d 
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has a simple expression 

£f(x)dx-jy{y)dy 

which is the product of the probability for X to have its value in the 
interval (a, h) by the probability for Y to have its value in the interval 
(c, d), in perfect analogy with the compound probability of two inde¬ 
pendent events. 

Finally, the mathematical expectation of any function y) can be 
defined by 

y)) = /-•-/-• y)dxdy 

provided the integral in the right member exists. 

11 . It is hardly necessary to dwell at length upon the case of several 
stochastic variables. A system of particular values Xi, X2, . . . Xn of 
n stochastic variables Xi, ^2, . . . Xn may be considered as a point in 
n-dimensional space. The density of probability is a non-negative func¬ 
tion <p{xif X2, • . . Xn) defined in the whole space and satisf3dng the 
condition 


* ' ‘ X2f . . . Xn)dXidX2 • • • dXn == 1. 


The probability for a point representing Xi, X2, . . . Xn to be located 
in a given domain <r is given by the integral 


// 


1, X2y 


• . . Xf)^dxidx2 ... dxn 


extended over <r. In the case of uniform distribution of probability, 
<p{xij X2, . . . Xn) is by definition a constant in a certain finite region 
of space and =0 outside of that region. If V is the volume of that 
region and v the volume of the domain a, the ratio v/V gives the proba¬ 
bility that a point belongs to <r. 

The probability of the simultaneous inequalities 


ai < Xi < hi; a 2 < X 2 < 62,’ . . . On < Xn < 6n 


is given by the integral 

p, p« f%(xi, X2, . . . Xn)dXidX2 , . . dXn 

which, by introduction of the conditional probabilities as in the case of 
two variables, can be put into the form of a product of n integrals in a 
manner perfectly analogous to the expression of the probability of a 
compound event with n components. Finally, the variables are inde- 
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pendent if the density tpixi^ . . . Xn) is a product of n functions 
depending only upon Xi, X 2 , . . . ar„ respectively, and conversely. 

The expression 

*1, • • • X,)] = ■ ■ ■ ^“j'vdxidxt ■ ■ • dxn 

serves to define the mathematical expectation of any function ^(xi, 

X2} . • • Xn) of Xi, X 2 , • • • Xn. 

12. Since in introducing the extended idea of probability we took 
care to preserve the fundamental theorems of the calculus of probability, 
we may be sure that other theorems derived from them will hold for 
continuous variables. In particular, theorems concerning mathematical 
expectation and the fundamental lemma in Chap. X, Sec. 1, hold for 
continuous variables. Upon this basis as we have seen was built the 
proof of the law of large numbers. Hence, this important theorem 
applies equally to continuous variables. 


Geometrical Problems 

13. A few geometrical problems will afford a good illustration of the 
foregoing general principles. 

Problem 1. A rectilinear segment AB is divided by a point C into 
two parts AC = a, CB = h. Points X and Y are , ^ 

taken at random on AC and C.S, respectively. What is ^ xc y B 

the probability that AXy XYy BY can form a triangle? 

Solution. We must first agree upon the meaning of the expression 
^^at random.” The idea suggested by this expression implies that the 
way of selecting points X and Y gives no preference to 
point of AC and CBy respectively. Consequently, 
variables x = AX and y = BY may be assumed to have 
distribution of probability. The domain of the 
2 / is a rectangle OMPN with the sides OM = a, 
o s Af Q = h. In order that AXy XYy BY can form a triangle 
Fiq- the following inequalities must be fulfilled: 


way of s 
\ any poir 

\.. r variables 

^ uniform 

lx point X, 


x< (a A-h — X — y)-\-y or x < a Ar h — x 

y< (a A-h-x-y)A-x or y<a + 6- y 

a + 6 — x~y<x + y. 


These inequalities are equivalent to 


X < 


fl A" h 
—s—» 


y < 


a A~ h 

“T"' 


X A- y > 


(I A~ b 

"T~’ 


To interpret them geometrically through P draw a line QPR making 
<RQO = 45®. From the mid-point of QR drop the perpendiculars 
VS^ VW on OXy OY, Then the preceding inequalities limit the position 
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of the point Xy y to the shaded area SVWy whose part TSU is contained 
in the rectangle OMPN, Variables x and y are independent and have 
uniform distribution. Hence, the density of probability of the pair 
Xy y is constant and the probability that the point Xy y is in the triangle 
TSU will be 

Area TSU ^ ^ ^ lb 
Area OMPN ah 2 a 

At the same time this is the probability for AXy XYy BY to form a 
triangle. 

Problem 2. On a line AB two points Xi, X2 are taken at random. 
What is the probability that AX\y XiX^y X^B can form a triangle? 

X2 



Fig. 6. Fio. 7. 

Solution. Variables AXi = X\y AX^ — X2 are independent and have 
uniform distribution of probability. The domain of all possible positions 
of the point Xiy xz is a square with the side AB L Positions of this 
point when AXiy X 1 X 2 , X 2 B form a triangle can be characterized as 
follows. First, if Xi precedes X 2 , we have 

X 2 — xi < xi + I — X 2 or xz — Xi 




1 

Xi < Xz — Xi + 1 Xz 

or 


1 — Xz < Xz — Xi -i- Xi 

or 

*2 >5 


which means that Xi, xz belongs to the triangle OPNy the definition of 
which is evident if L, My iV, P are mid-points of the sides of the square 
A BCD. Second, if Xi follows X 2 , we have 

Xi X 2 ^ 2 * ^ 2 * ^ 2 


and these inequalities define the area OLM. Since the distribution of 
Xiy xz is uniform, the required probability is 

Area OLM + Area ONP _ _ 1 

Area ABCD ~ 4 

Problem 3. A chord is drawn at random in a given circle. What is 
the probability that it is greater than the side of the equilateral triangle 
inscribed in that circle? 
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Solution 1. The position of the chord drawn at random can be deter¬ 
mined by its distance from the center of the circle. This distance may 
vary between 0 and Ry the radius of the circle. The chord is greater 
than the side of the equilateral triangle inscribed in the circle if its dis¬ 
tance from the center is less than Hence, the required probability 


Pi = 


R 


Solution 2. Through one end of the chord, draw a tangent AT. 
The angle ip varying from 0° to 180° determines the position of the 
^ chord. If it is greater than the side of the inscribed equilat¬ 
eral triangle, the angle tp must lie between 60° and 120°. 
Hence the answer 



Fio. 8. 


P2 = 


120° - 60° 


180° 


•The fact that we obtain two different numbers for the same probability 
seems paradoxical, and the problem itself is known as Bertrand’s 
paradox.” However, going attentively over both solutions, we discover 
that we are really dealing with two different problems. In the first 
solution it was assumed that the distance of the chord from the center 
has uniform distribution, while in the second solution the distribution 
of the angle <p was taken as uniform. The second solution may be con¬ 
sidered reasonable if a thin bar or a needle can rotate freely about A 
and if, being set in motion, it determines the chord AB hy its ultimate 
position. On the other hand, the first solution is acceptable if a circular 
disk is thrown upon a board ruled with parallel lines distant from one 
another by the diameter of the disk. The intersection of the disk with 
one of the lines determines a chord, and the probability that it is greater 
than the side of the inscribed equilateral triangle can reasonably be 
assumed to be Yz- 

A general remark applies to all problems of this kind. When a 
certain geometrical element, such as a point or a line, is supposed to be 
taken at random, it should be clearly indicated by what kind of 
mechanism this is to be done. Only then the hypothetically assumed 
distribution can be put to an experimental test and cither confirmed 
(approximately) or rejected. 

14. Buffon’s Needle Problem. A board is ruled with equidistant 
parallel lines, the width of the strip between two consecutive lines being 
d. A needle so fine that it can be likened to a rectilinear segment of the 
length I < d is thrown on the board. What is the probability that the 
needle will intersect one of the lines (naturally not more than one)? 

Solution. This is the oldest problem dealing with geometrical 
probabilities. It was mentioned by Buffon, the celebrated French 
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naturalist of the eighteenth century, in the Proceedings of the Paris 
Academy of Sciences (1733) and later reproduced with its solution in 
Buffon^s book **Essai d^arithm^tique morale,published in 1777. 

Let us determine the position of the needle by the distance OP = x of 
its middle point from the nearest line, and the acute angle <p between OP 
and the needle. Variables x and <p may be considered as independent. 
Furthermore, x and <p vary respectively between 0 and and 0 and 
ir/2. As a hypothesis we assume the distribution of probability for 

X 


Fig. 9. Fio. 10. 

X and <p as uniform. The domain of a:, ^ is a rectangle OABC with 
OA = t/2, OC = d/2. Now, the needle intersects one of the lines if 

a; < 2 cos ^ 

and then the point x, <f> lies in the shaded area below the curve 

I 

^ ” 2 




Since the distribution of x, (p is uniform, the required probability will be 

_ Area OAD 
^ ~ Axe&OABC' 


But 


Area OAD = ^ f 
Area OABC = ^ ^ 


cos (pdip = 2 


and consequently 



On pages 112-113 an account was given of experiments made by several 
authors in connection with Buffon^s problem. They all show good agree¬ 
ment with the theory and indirectly confirm the hypothesis assumed in 
deriving the above expression for probability. 

16. Extension of Buffon’s Problem. A thin plate in the shape of a 
convex polygon, of dimensions so small that it cannot intersect two of 
the lines simultaneously, is thrown on a board ruled, as in Buffon’s needle 
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problem. What is the probability that the boundary of the plate will 
intersect one of the lines? 

Solution. Suppose that the polygonal boundary has five sides. 
Let these sides (and their lengths) be denoted by 

a, 7 , 5, €. 

Each of them is shorter than the distance d between two consecutive 
lines. On account of convexity, a line can intersect either none or two 
(and only two) sides. Accordingly, combining sides in pairs, we can 
distinguish 10 mutually exclusive cases and denote their probabilities by 

(a^), {ay), {ah), (ae), {fiy), (fih), (/3e), {yh), {ye), {he). 

The required probability will be given by the sum 

p = {afi) + (a£7) + (a 5 ) + {ae) + 037) + (fih) + {fie) -f (7^) + 

-f (7€) -f («€). 

bn the other hand, the side a can be intersected by a line in four mutually 
exclusive ways; namely, together with fi or 7 , or 5, or e. Hence, if (a) is 
the probability of intersection 


and similarly 

whence 

But 


(a) = {afi) + (« 7 ) + {oth) + {ae). 


{fi) = {fia) + {fiy) + {fih) + {fie) 
(7) = {ya) + {yfi) + (7^) “h (tc) 
(fi) = (fia) + (fi^) + (57) + ( 5 c) 
{e) = (ea) + {efi) + (e 7 ) + (efi). 


(a) “f {fi) + (7) + ( 5 ) -f (e) = 2 p. 


f \ 2a 




= 7d’ 


and consequently 


P = 


ird 


P_ 

ird 


where P is the perimeter of the polygonal boundary. Evidently this 
result is perfectly general. Since it does not depend upon the number of 
sides, by passage to the limit, it can be extended to the case of a plate 
bounded by any convex curve. 

16. Second Solution of Buffon’s Problem. Barbier has given another 
extremely ingenious solution of Buff on’s problem and of its extension. 
Let f{l) be an unknown probability that the needle will intersect a line. 



254 INTRODUCTION TO MATHEMATICAL PROBABILITY [Chap. XII 


Imagine that the needle is divided into two parts V and Z". Evidently a 
line intersects the needle if, and only if, it intersects either the first or 
the second part. Hence, by the theorem of total probabilities 

m =/(n+/(n, 

whence, as in Sec. 4, we conclude 


m = Cl 


where C is a constant independent of 1. The whole question is how to 
determine this constant. Barbier’s ingenious idea was to let this 
problem depend on the solution of another one: A polygonal line (convex 
or not) is thrown upon the board; what is the mathematical expectation 
of the number of points of intersection? The perimeter of the polygonal 
line can be subdivided into n rectilinear parts ai, 02 , . . . On all less than 
d. With these n parts we can associate n variables Xi, Xij . . . Xm such 
that 

Xi = 1 if one of the lines intersects a*- 
= 0 otherwise. 

The sum 


5 = aji + 0:2 + * * * + iCn 


evidently gives the total number of the points of intersection. Hence 

E(8) = E(Xi) + E{X^ -j- . . . _j_ E(Xn) 

and, if is the probability of intersection of a,- with one (and only one) 
line, 

E{xi) = Pi. 

But, according to the previous result. 

Pi = Coi. 

Hence, we have a perfectly general formula 

E{s) = C(ai + 02 + * ' • + On) = CP 

where P is the perimeter of the polygonal line. The result holds for any 
curvilinear arc (closed or not) as can be seen by the method of limits. 
This formula applied to a circle with the diameter d gives 

C • ird = 2 


since such a circle has always exactly two points of intersection 
the lines of the system. Thus we find that 


C = 


vd 
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and 


/(O 


% 

ird 


as obtained before. For a closed convex line of sufficiently small dimen¬ 
sions only two cases are possible: two intersections (probability p), or 
none (probability 1 — p), whence E(8) = 2p and 



or 


p = 


jP 

xd 


in agreement with the result obtained in Sec. 15. 

17. Laplace’s Problem. A board is covered with a set of congruent 
rectangles as shown in the figure, and a thin needle is 

thrown on the board. Supposing that the needle is shorter_ 

than the smaller sides of the rectangles, find the probability- 

that the needle will be entirely contained in one of the --- 

rectangles of the set. 

Solution. Let AB = AD = h he the sides of the rectangle which 
contains the middle point of the needle, the length of which is 


l(l<a,l< 6 ). 

Taking AB and AD for coordinate axes, the position of the needle is 
^ determined by two coordinates x, y of its middle point 

and the angle <p formed by the needle with the x axis. 
^ 1^ We may consider x, y, ^ as three independent variables 

_ ^ with uniform distribution of probability. The domain 

^ filled up with all possible points x, ip ie & 
parallelepipedon 


0 < X < o; 0 < y < 5; ^ ^ ^ 

and the distribution of probability throughout this domain is uniform. 
To characterize the domain of points representing positions of the 
N M 




middle point of the needle when it is located entirely within A BCD we 
oonsider the sections of that domain by planes ip = constant and their 
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projections on the plane xy. These projections are represented by 
the shaded areas in Figs. 13 and 14 corresponding to positive and negative 
tpf respectively. 

In Fig. 13 

<PAB = AP||BF11C/2||ZX? 


and AP =- BE ^ BF ^ CR DO = DH JZ. 
Similarly, in the second figure 

<JAB = ip; AJ\\BQ\\CL\\DS 


and AJ ^ AK=- BQ-= CL ^ CM \l 

The area of the rectangle PQRS corresponding to these two cases can be 
expressed as follows: 

Area PQRS = (a — I cos <p)(h — I sin ip) ^ ab — l(b cos ^ -f- u sin + 

+ sin ip cos <p, 

Area PQRS ^ (a ^ I cos ^)(6 + I sin ip) = o6 — l(b cos ^ — a sin ^) — 

— P sin ip cos ip. 

Without distinguishing positive and negative values of we may write 

Fiip) = area PQRS = a5 — W cos ^ — Zajsin ip\ -f- JZ^lsin 2ip\. 

The volume of the domain representing positions of the needle entirely 
within ABCD is: 


while 


V = = »-o6 - 26i - 2oi + P 

"2 


F = ira5 


is the volume of the domain 

0<»<a, Q<y<h. 

Hence, the required probability is: 

^ ^ , 2Z(a + 6) - Z* 

- ^ - 

and the complementary probability for the needle to intersect the 
boundary of one of the rectangles is: 
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Buffon’s problem may be considered as a limiting case when a oo 
and, indeed, by setting a = qo , wc find that 



in conformity with the result in Sec. 14. 

These examples may suffice to give an idea of problems in geometric 
probabilities. Sylvester, Crofton, and others have enriched this field 
by extremely ingenious methods of evaluating, or rather of avoiding 
evaluations, of very complicated multiple integrals. However, from the 
standpoint of principles, these investigations, ingenious as they are, 
do not contribute much to the general theory of probability. 


Problems for Solution 

1. A point X is taken at random on a rectilinear segment AB I whose middle 

point is O. What is the probability that AX, BX, and AO can form a triangle? The 
.distribution of AX = x is assumed to be tfhiform. Ana. 

2. Two jwints Xi, Xt are taken at random on AB = /. 

Assuming uniform distribution of probability, what is the mathe- A —- B 

matical expectation of any power n of the distance between Xi 
and X,? 

r* r*i i.dxidxj 

' Jo Jo P * (n -f l)(n -f 2) 

3. Three points Xi, Xt, Xi are taken at random on AB. What is the probability 
that Xt lies between Xi and Xa? 

Ana. Hi assuming uniform distribution of probability. 

4 . A rectilinear segment AB is divided into four equal parts 

AC = CO ^OD = DB. 


Ans. 


O 

Fio. 16. 
2P 


Supposing that the distribution of probability is symmetric with respect to 0, let P 
be the probability that a point selected at random on AB will be between C and D. 

Also, let Q be the probability that the middle point between 
^^—Q p g two points selected at random will be between C and D. Prove 
1 4- 

Fia. 16. that Q > — - - 

Hint: The middle point of a segment XiXa is surely between C and D if : (i) Xi 
and Xa are in CO; or (ii) Xi and Xa are in OD; or (Hi) Xi and Xa are on opposite sides 
of 0. 

6. Two points Xi, Xa are chosen at random in a circle of radius r. Assuming 
uniform distribution of probability, what is the mathematical expectation of their 
distance? Ans. Denoting the required mathematical expectation by M, we have 

r'r*M - *')'**^*' 

where 

f(x, e, $’) = + p'’ - 2pp' cos (« - 

Hence, varying r by dr 

dF » 2rdrJ^’^\/r^ -f p* — 2rp cos — ^Opdp 
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and 


d(ir*r*M) * ■+' p* — 2rp cos tapdpdto. 


By introduction of new polar coordinates the integral in the right member can be 
exhibited as 



Fio. 17. 


Thus 


whence 


r 

J 2 P2r ooa c# 

dci) I u*du 

-5 Jo 



cos* tidta 



dinr^M) - 


M 


128r 

46ir‘ 


6. A board is covered with congruent rectangles as in Laplace’s problem. A coin 
the diameter of which is less than the smaller side of the rectangles is thrown on the 
board. What is the probability that it will be partly in one rectangle and partly in 
another? Ana. a, 6, r being respectively the sides of the rectangles and radius of the 
coin, the required probability is 


2r(o -f 6 — 2r) 
ab 

7. Solve Buffon’s problem when the needle is longer than the distance between 
two consecutive lines. Ana. The probability for the needle to intersect at least one 
line is 

p * —(1 - sin H- 

ira ir 

where v’o is determined by cos v>o = d/l. 

8* A board is covered with congruent triangles whose sides are a, 6, c. A needle 
whose length is less than the shortest altitude of any one of these triangles is thrown 
on the board. What is the probability that the needle will be contained entirely 
within one of the triangles? Ana. The required probability is 

(Ao« -f B6* 4- _ (4o -(- 4b -f 4c - 31)1 

^ 2irQ* 2xQ 

where A, B^C are angles opposite to sides a, b, c and Q is double the area of the triangle. 
For equilateral triangles 



9. On each of the circles Oi, Ot, Os, . . . with respective radii ri, rt, ri, . . . 
points Mij Ms, Ms, ... are taken at random. Supposing that the series 

ri 4* rt + rs -f • • • 

is divergent, while the series 

*^1 + rj -f rj 4- • • • 
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if) convergent, prove that the probability that the length of the vector 

OM as OiM 1 -4" OtMt + OtMi + • • • -h OnMn 

will be > tends to 0 as A —> » no matter how large n is. 

Indicaiion of Solution. Let ®i, ac*; yi, yi, . . . yn be components of 

OMifOMtt • • • OAfi» on two rectangular axes OX, Oy. Then 

E{xi) - Eiyt) - 0 
E(x}) - B(»J) - 


2 




Fio. 18. 


By Tsheb 3 rshe£F’s lemma (Chap. X, Sec. 1) the probabilities Q and Q' of the inequalities 


. are both less than 1/P. Now, if the length OM > R then either 



or 


*R 

|xi + X| + • • • + *i»| > —7= 

V2 

R 

l»i + + • • • + »«I > 



Hence, the probability P for the length of OM to be > is less than Q 4- Q'; 
that is, 

20 

P<Q+Q'<j,- 

10. Prove that 


Urn 
n- • 


rr.fv 

JO Jo Jo 


+ + 


+ xt-{- 


X* 

dxidx2 • • • dXft 


2 

3* 


Hint: Considering Xi, Xi, . . . Xn as continuous stochastic variables with uniform 
distribution over the interval (0, 1) prove with the help of Tshebysheff's inequality 
that the probability of 


2 _ x[+xl + ■ ■ ■ +»; 2 

3 Xi -+■ ' 4" 3 


for any c > 0 tends to 1 as n co. 
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CHAPTER XIII 

THE GENERAL CONCEPT OF DISTRIBUTION 

1. In dealing with continuous stochastic variables we have introduced 
the important concept of the function of distribution. Denoting the 
density of probability by f(z), this function was defined by 

Fit) = Jiz)dz 

and it represents the probability of the inequality 

X < t. 

For a variable with a finite number of values the function of distribu¬ 
tion can be defined as the sum 

Fit) ^ Xpi 

Zt<t 

where pi, pa, . . . Pn are respective probabilities of all possible values 
Xif * 2 , . . . Xn of the variable x. The notation Xi < Ms intended to 
show that the summation is extended over all values of x less than t. 
Again, F{t) for any real t represents the probability of the inequality 

x < L 

In this case F{i) is a discontinuous function, never decreasing and varying 
between F( — «) * 0 and F(4-oo) = 1. Its discontinuities are located 
at the points Xi, X 2 j . • . Xn and are such that 

F{xi + 0) - F(x< - 0) = Pi, 
denoting, in the customary way, 

F(Xi + 0) = lim F{xi + e) 

F(xi — 0) = lim F{xi — c) 

when €, through positive values, converges to 0. To represent F(t) 
graphically we note that 



Fit) = 0 

for 

t < Xi 


Fit) = Pi 

for 

Xi< t < Xi 

Fit) 

= Pi + Pj 

for 

t 

F(t) = Pi + Pi + 

• • • + p» 

for 

Xn < t. 
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As for the value of F{t) at the point t = it is F(x< — 0). Hence, 
the graph of F{t) consists of rectilinear segments as shown in the figure 
(for n = 4; Xi = — 2 ; X 2 = 0; Xa = 1; X 4 = 3; pi = pa =* pa = pa = 34) 
and belongs to the so-called step lines. 

Thus, in case of a continuous variable the distribution function is 
given by an integral, and in case of a discontinuous variable, by a sum. 
In stating theorems equally true for continuous and discontinuous 
variables, it would be tedious always to distinguish these two cases. 
The question naturally arises whether it is possible to represent distribu¬ 
tion functions, moments, and similar quantities by using new symbols 
equally applicable to continuous and discontinuous variables. In a 
similar kind of investigation Stieltjes was confronted with the same 


-00 -2 01 3 -►oo 

. Fia. 19. 

difficulties and he succeeded in overcoming them by introducing a new 
kind of integrals known as Stieltjes* integrals.** 

Stieltjes* Integrals 

2. Let (p(x) be a never decreasing function defined in the interval 
o ^ X ^ 5. For any particular value of the argument both the limits 
(for c converging to 0 through positive values) 

lim <p(zo -f €) = <p{xo + 0 ) 
lim ^(xo — c) = ^(xo - 0 ) 

exist. Since evidently 

<p{xo - 0) ^ <p(xo) ^ v>(xo 4- 0), 

Xo is a point of continuity of ip(x) if 


ip{xo ~ 0) = ^(xo + 0). 

If, however, 

<p{xq — 0 ) < ip(xo -|- 0 ) 


<p{x) is discontinuous at Xo, and the difference 


mo = ^(xo + 0) — ^(xo — 0) 

gives the measure of discontinuity or simply discontinuity. Since 
for any number of points of discontinuity xo, Xi, . . . Xn the sum of 
discontinuities 


m® + mi -f- • • • + nin ^ <pih) — ^(o) 
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the points of discontinuity form a countable set. For there are only a 
finite number of discontinuities above any given number, so that, con¬ 
sidering the sequence 

5 > 5i > 52 > • . • 

tending to 0, there is only a finite number of points with discontinuities 
>5; also a finite number of points with discontinuities ^5 and >5i, 
and so on. It follows that points of discontinuity can be arranged into 
a single sequence and hence form a countable set. 

It may happen, however, that ip(x) may have discontinuities in any 
interval, no matter how small; but at any rate there are points of con¬ 
tinuity in any interval. If + «) > ipix^ — e) for all suflSiciently small 
€ > 0 the point Xq is called a ‘‘point of increase” of ip(x). In particular, 
any point of discontinuity is a point of increase. 

3. Let J{x) be a continuous function in the interval a ^ x ^ b. By 
inserting points Xi < X 2 < ... < Xn this interval is subdivided into 
n + 1 partial intervals. In each of these we arbitrarily select points 
{o, Ji, . . . in and form the sum 

<8 =/(fo)[v>(^i) •” v>(a)] + f{ii)Mx 2 ) — v>(a;i)] -f . • • -f- 

+ /(fn)[^(6) — ^{Xn)]. 

It can be proved in the same way as for ordinary integrals that when 
all intervals 


Xi -- Uj X2 — Xlf . . . h — Xn 

tend to zero uniformly, the sum S tends to a definite limit. This limit, 
called Stieltjes' integral, does not depend upon the manner of subdividing 
the interval (a, b) or upon the choice of points fo, fi, . . . {«. It has 
a perfectly definite value as soon as f{x) and (p{x) (together with a, b) 
are given, and accordingly is denoted by 

jy{x)d<fi(x). 

In case ip(x) has a continuous derivative, d(p(x) can be interpreted 
as the ordinary differential; Stieltjes' integral then coincides with the 
ordinary one. In other cases d<p(x) is a new symbol introduced as a 
reminder of the origin of Stieltjes’ integral. In particular, if <p(x) is a 
step function with discontinuities pi, p 2 » Ps* • • • at the points ari, 
Xif xs, . . . , Stieltjes^ integral coincides with the sum 


^PifiXi) 

which is a finite sum or an absolutely convergent infinite series according 
as the set of points of discontinuity is finite or infinite. 
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Stieltjes' integrals possess many properties of ordinary integrals. 
For instance, the mean-value theorem holds for them in the form: 

£f{x)d^ix) = m)W{b) - »»(a)] 

where a ^ ^ 6. Also, if f{x) has a continuous derivative, we have an 

analogue for the integration by parts 

£fix)dv{x) =f(bMb) -KaMa) - f\(x)dm 

where df(x) means an ordinary differential and the integral in the right 
member is an ordinary integral. However, some important properties 
of ordinary integrals do not hold universally for Stieltjes^ integrals. For 
instance, considered as functions of h or a, they may have discontinuities. 

In the definition of Stieltjes' integral it was assumed that a and b 
were finite numbers. Stieltjes^ integral over the interval — oo, +oo is 
'defined in an ordinary way as being the limit of 

jy{x)dip{x) 

when a and h tend independently to — « and + «, respectively. In 
other words, 

J{x)dip{x) = lim jy(x)d<p(x) when a-->—oo, 
provided this limit exists. If it does not exist, the symbol 

f'jix)d<p(x) 

has no meaning. 

The General Concept of Distribution 
4. The most general type of distribution function of probability, 
covering all imaginable cases, is given by a never decreasing function 
F{t) defined for all real values of t and varying from F(— «>) = 0 to 
F(+oo) = 1. If at points of discontinuity we set 

Fit) = Fit - 0), 

then for any t the probability of the inequality 

X < t 

will be given by Fit), Also, the probability of the inequalities 

ti ^ X < it 


will be 


Fit,) ~ F«,). 
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The case of continuous F(0, having a continuous derivative f{i) 
(save for a finite set of points of discontinuity), corresponds to a con¬ 
tinuous variable distributed with the density/(O, since 


F{t) = J(x)dx. 

If F{t) is a step function with a finite number of discontinuities, it charac¬ 
terizes the distribution of probability of a variable with a finite number 
of values. Finally, if F{i) is a step function with an infinite set of dis¬ 
continuities distributed without density, it corresponds to a variable 
whose values can be arranged in a sequence according to their magnitude. 
These are the most important types of variables considered in the 
calculus of probability, and for all of them the distribution function can 
be represented by Stieltjes* integral 


F{t) = jiF{x). 

The mathematical expectation of any continuous function /(<) is 
defined by Stieltjes* integral 


E{m) = f_\mdF{t) 


provided it has a meaning. In particular, moments of the order n (n 
positive integer) and absolute moments of the order a (a real) are defined, 
respectively, by 


and we always have 
Finally, 


Tlln = 


|m.| g Hn. 


^(t) = J’yi-’dFix) 


is the characteristic function of distribution. Since the integral exists 
for any real t, this function is defined for all real values t and satisfies the 
inequality 


k(OI ^ 1. 


Inequalities for Moments 

6 . Moments of any distribution satisfy certain inequalities, which 
it is important to know. They all are particular cases of the following 
very general inequality due to Liapounoff. 
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Liapounoff*8 Inequality. Let a,h,che three real numbers satisf 3 ring 
the inequalities 

a^b^c^O 

and flat Mbi Me absolute moments of orders a, 6, c for an arbitrary distribu- 
tion. Then the following inequality holds: 

mT* ^ Mr^Mj^. 

Proof, a. Let pi, pj, . . . Pn; Xi, xt, Xn he positive numbers 
and 


if>(a) = PiXf + P2Xt + • • • + PnXt 

Then for arbitrary real numbers Si, S 2 , . . . Sp the following inequality 
holds: 


( 1 ) 




For p = 2 this inequality follows immediately from the known inequality 
due to Cauchy: 


by taking in it 


/ n \ 2 n n 

•t «• 

o* = bi = 


For p = 4 we have 

^?L±jL+ii±j5y g ^?i^*y^!i:^‘y ^ 

and continuing in the same manner we find in general that 

— 2m -^ ^(«i)v»(«2) • • • V’(S2-). 

Let m be taken so that 2”* > p and let us take in the last inequality 

gi + ^2 + • • • + Sp 


Sp+i — Sp^2 — • • • — 52« — 


Since 


Si + 52 + 


+ 52^ _ ps + (2” - p)s 


= 8 


we shall have 


(p(sV* ^ • • • ^(sp)^(s)*" 
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whence 

^ (p(Sl)<p(8t) • • • ip(8p), 

which is inequality (1). 

b. Let o^h^c^Obe integers. Taking p = a c; «i = = 

• • • = = c; Sa^i = • • • = = o, we have 

"h 4“ • • • + gp-c _ (g — b)c "h (h ~~ c)a _ ^ 
g — c g — c 


and consequently, by virtue of (1), 


/ n Va—e / n Vo—5/ n V6- 

( 2 ) ^ ) 


If g = p/«, 6 = ^/fi, c = r/s are rational numbers (g ^ 6 ^ c ^ 0), 

1 

it sufl&ces to take, in (2), p, q, r instead of g, 6, c, replace Xi by a;{, and 
raise both members to the power 1/s to ascertain that (2) holds for 
rational g, &, c. Finally, the passage to the limit makes it clear that (2) 
holds for real g, 6, c, provided g ^ 6 ^ c ^ 0. 

c. Let the interval id to B be subdivided into partial intervals by 
inserting numbers fi < <2 < * • • < between A and B and let 

po - Fill) - F(A), Pi « F(« - F(hh . . . pn = F(B) - F(f.) 

Xq = ~ l^l|» • • • ~ 


Then the three sums 


'X/Pi^i, Xpai 

0 0 0 

will tend to the respective limits 

//itiwct), jyi’dFit), 

when all differences A - h, U - tt, ... B — t, tend to 0 uniformly. 
Hence, passing to the limit in (2), we get 

and finally, letting A tend to « and B to + «>, 

(/.-Jiwo)- S (/.•.WiPw)" ■ 
or 

( 4 -^ ^ 
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as stated. 


a + c 


Taking b = —^—, Liapounoff*s inequality becomes 


whence 


o—c a—e 

2 


M o-hc ^ F’cf^a 
2 


for any two real positive numbers a and c. If k and I are two positive 
integers and we take c = 2A;, a = 2i, then 


or 

since 


|mjk+i| ^ and M 2 fc = wij*, /i 2 i = m 2 i. 


Another important inequality results if we take c = 0. Then, since 
Mo = 1, 

MS ^ mJ 
or 

1 1 
^ 1*1 

\{ a > h > 0. This amounts to 

logw ^ lo^ y ^ J 
b ~~ a 

which is equivalent to the statement that 

iog M» 


is an increasing function of x for positive x. 


Composition op Distribution Functions 
6 . An important problem in the calculus of probability is to find the 
distribution function of the sum of several independent variables when 
distribution functions of these variables are known. It suffices to show 
how this problem can be solved for the sum of two independent variables. 

Let X and y be two independent variables with the corresponding 
distribution functions F(t) and G{t), To find the distribution function 
^(0 of their sum 


z ^ X + y 
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is the same as to find the probability of the inequality 

x + y <t 

for an arbitrary real number t Here, for the sake of simplicity and in 
view of the applications we propose to consider later, we shall assume that 
one, at least, of the variables x, y has continuous distribution with 
generally continuous density. 

At first, let both x and y have continuous distributions so that 

The probability of the inequality 

X y <t 

according to the general principles stated in Chap. XII is expressed by 
the double integral 

H(t) = J* jf{x)g{y)dxdy 
extended over the domain 


X + y <L 

Now, following ordinary rules, we can reduce this double integral to a 
repeated integral. To this end, for any fixed x we integrate g{y) between 
limits — « and < — x, thus obtaining 

y(y)dy = Oit - x). 

Then, after multiplying by /(x), we integrate the resulting expression 
between limits — <» and + <» for x. The final result will be 

H{t) = - x)f(x)dx 

or, written as Stieltjes* integral, 

H(t) = f“G{t - x)dFix). 

In the second place, let x be a discontinuous variable with different 
values Xi, xs, Xs, . . . and corresponding probabilities pi, pt, ps, . . . . 
For X =* Xi the inequality 


x + y <t 


is equivalent to 


y <t - Xi 
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and the probability of this inequality is G{t — x<). Since the probability 
oi X — Xi is Pi, the compound probability of the two events 


will be 


X = Xi 
X + y <t 


p^G{t — a^i). 

The total probability H{i) of the inequality 


X y <t 

will be expressed by the sum 

H{t) = 2p.^a - Xi) 

extended over all possible values of x. But this sum can again be written 
as Stieltjes* integral: 


■( 1 ) H{t) = x)dnx). 


In both cases we obtain the same expression for H(t). Evidently 
H{t) can also be defined as the mathematical expectation of G{t — x ): 


Hi}) = E\G{t - x)\ 


taken with respect to the variable x. The important formula (1) is 
known as the formula for composition of distribution functions Fit) 
and G{t). 

Example. Let x and y be two normally distributed variables with means » 0 
and respective standard deviations <ri and <rs. Instead of using (1), it is better to 
write H{t) as a double integral 


Hit) = 


1 

2iro’i<rs 



x« _^ 

2«r,* 2att^y 


extended over the domain 

X -{-y <t. 

To evaluate this integral, it is natural to introduce x + y = z as a new variable and 
find constants C, D, a, 3 so as to have identically 


2al'^2<r\ 


Cix + y)* + Dica + py)\ 


whence one easily finds 


C - 


1 


2(^;+<r;) 


» -aj 


2al 


2(»; +<r5){^* 


+ »)• + 



and 
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The Jacobian of 


* « * + V, 


u 


<ri 

-X-y 


with respect to x, y being 


1 1 

^ -f gj 

<ri at ciat 


Hiji) can be presented as the double integral 

1 r c «*+<** 

^(0 “ ;rT-i-i; I I « 

2T(«r; + a\)J J 

with the domain of integration defined by a single inequality: 


Hence, 


OP 


since 


z <t. 


mo 


1 r 

2r(a} + ai)J - 


** 



e 




mo 


\/2T(af + <r}) 


=r 

«) j-. 


g» 



u« 

2(,r,*+^,«)^^ ^ <y/2»(crf + «r}). 


The expression obtained for leads to a remarkable conclusion: 
The sum of two normally distributed variables with means = 0 and 
standard deviations ai and o-s is also a normally distributed variable with 
the mean = 0 and the standard deviation <r = \/<rf + (rf. If the means 
of X and y are ai and as, then evidently z will be normally distributed 
with the mean a = oi + and the standard deviation <r = + <r|. 

Repeated application of this result leads to the following important 
theorem: 

If Xi, Xs, . . . Xn are normally distributed independent variables with 
means ai, at, • an and standard deviations vi, as, . <r», then their sum 


Z ^ Xi + Xs + * Xn 


is again normally distribiUed with (he mean a — ai + as • • • + Un 

and the standard deviation <r = \/(r| + + • • • + 0 * 1 . 

Finally, any linear function 

U = CiXi + CtXs + * * * + CnXn 


is normally distributed with the mean a = CiOi + csHt + * • • + CnOn 
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and the standard deviation <r = \/cf<rl -j- €^a\ + • ' * + cicr*. In 
partioular, the arithmetic mean 

Xi + Xa 4“ • • ‘ + Xn 
n 


of identical normally distributed variables with the mean a and the 
standard deviation cr is normally distributed about the mean a and with 
the standard deviation <r/\/n. Hence, the conclusion may be drawn 
that the probability P of the inequality 


is given by 


+ Xi + • * 

• 4 Xn 

n 



P = 


\/n 


/:■ 


e ^*dx — 




• ■%/ n 

V’ 


'^dt 


and rapidly approaches 1 as n increases. This is a more definite form 
of the law of large numbers applied to normally distributed (identical or 
equal) variables. 


Determination op Distribution When Its Characteristic Function 

Is Given 

7, One of the most important conclusions to be drawn from the 
preceding considerations is that the distribution function of probability 
is uniquely determined by the characteristic function. The known 
proofs of this fact are rather subtle, owing to the use of conditionally 
convergent integrals. However, such integrals can be avoided by resort¬ 
ing to an ingenious device due to Liapounoff. In the general case, the 
distribution function of a variable x has discontinuities. To avoid the 
bad effect of these discontinuities, Liapounoff introduces a continuous 
variable y that, with reasonable probability, can have values only in the 
vicinity of 0. It may be surmised, therefore, that the continuous 
distribution function of the sum x + y will approximately represent that 
of X and, by disposing of a parameter involved in the distribution function 
of y, will tend to it as a limit. To make these explanations more definite, 
let 2 / be a normally distributed variable whose distribution function is 


G{t) 



_*« 

e 


When h is small, the probabilities of any one of the inequalities 

y > 6, y < 
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will be extremely small and even will tend to 0 when A tends to 0. Hence, 
the distribution function H(t) of the sum z + y is likely to tend to 
F{t) as a limit when A tends to 0. 

To prove this in all rigor, we apply the composition formula (Sec 6) 
to our case. We obtain the following expression for H{t ); 


1 rt-x *» 

Hit) = J dF(x)j e-hidz 

or, in more convenient form 

t — x 

dF(z)J* e^'du; 

and furthermore, integrating by parts, 

H(t) = f “ e“(^) F(x)dx. 

Av - 

The integral in the right member can be split into three parts 


1 




h's/ir, 

Now, for positive T 


Fiz)dz + 


1 r- 

hVijJ 


.-m 


Fix)dz + 




VrJr 


er^'du < ge-*”. 


Making use of this inequality, we find that 






Aa/jt, 
and similarly 


Fix)dx < 


Ml: 


Vir, 


i 


-» 1 
e~^*du < 756 ** 


so that 
H(t) 


1 r\ 

avtJ- « 


rm 


Fiz)dx < ^ 


1 r- ~ 

= I e 
Av^Jo 


+ u)du + 


1 


hy/i, 


i: 


•• 

e *'F(t — u)du + fie S*; 


0 < fl < 1. 

Given an arbitrary o- > 0, the number c can be taken so small that 

0 g f(t + m) - F(t + 0) < ff 
0 g F(t - 0) - F(t - u) <<r 



S*c. 71 


THE OBNERAL CONCEPT OF DISTRIBUTION 


278 


for 0 < tt < c, whence 


e + u)du 


F(t -f 0) f* 


< <r 


1 r* -- 

4= 6 mt - 

V^Jo 


u)du — 


Fit - 0) f* 


e““*du < or 


H(0 ^ + 0) + - 0) 

Vx 

On the other hand, 


j^*e-»*dw + »'(2(r + e *•); |»'| < 1. 


1 _ 1 1 




e~^'du = 2 


1 • 1 i}//_1^* 


so that finally 


H«) - 


F{t + 0) + F(t - 0)1 


< 2<r + 2e~**, 


and for all sufficiently small h (e being kept fixed) 

+ 0) + F(« - 0)1 


that is, 


ff«) - 


lim H(t) 

h —»0 


F{t + 0) + F(t - 0) 


or, if < is a point of continuity, 

lim Hit) = Fit), 

h-*0 

Now we must find another analytical representation for Hit), To 
this end we consider the difference 


t — x 

- H(0) = dF(*) J * e-'du, 


and, to represent in a convenient way the inner integral, we make use 
of the known integral 
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Multiplying both sides by du and integrating between and 
we find 



and 

H{t) - H(0) = ^J_ dv. 

The next step is to reverse the order of integrations, an operation 
which can be easily justified in this case. The result will be: 

Hit) - HiO) = j * !— ^— dvj_ f^^dFix) 
or 

if* -h^ 1 — g-ivt 

Hit) - H(0) = j * ^iv)t-^dv 

since 

ip{v) = J e*®*dF(x). 

Now, taking the limit of H(t) for h converging to 0, we have at any point 
of continuity of Fit) 

1 f * 1 - e-ivt 

(2) Fit) = C + ^ lim I e ^ <piv) - ^ - dv 

where the constant 

^„F(+0)+F(-0) 

^- 2 


is determined by the condition Fi—^) = 0. Thus, the distribution 
function is completely determined by (2) at all points of continuity when 
the characteristic function tpiv) is given. 


Example 1. Let us apply (2) to find the distribution corresponding to the 
characteristic function 


tp{v) « C 2 . 


Since in this case the integral whose limit we seek is uniformly convergenf with 
respect to h, we find simply 


Fit) 





e 




V 
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On the other hand (Chap. VII, page 128), 


—(V 
V 


BO that 




*—<to - 

<r 

e ^'du + 


and so finally 


r. 

.nF(-« 

1 r° -±! 

C “ — j=. I e 2»*dtt, 

<r V ^ J - • 

1 n 


J. 

1 n 


Taking <*»—«, the condition F( — « ) « 0 gives 


2#*du. 


Naturally, we find a normal distribution with the standard deviation e (compare page 
*270). 

Example 2. What is the distribution determined by the characteristic function 
ip(v) =* o > 0? 

.\8 in the preceding example we find that 

fio - c + i r ’ e-U'^ = C + i r ‘e-^. 

2r J - « V tJo V 

df sinto f . a 

-- I e”®»-dp « I e"®* cos todp « —-•» 

A Jo P Jo o* + 


But 


whence 


Thus 


i r " sin fa __ ® r * dap o T* dx 1 

ir Jo ^ xjo o* + xj- wO* + ** 2 

2 


F(0 

and the condition F( — «) -> 0 gives C = so that finally 
F{t) 


® r* dx 

irj-. .O* + ** 


Naturally we find the same distribution as that considered in Example 2, page 243. 
Sometimes it is called “Cauchy’s distribution” with the parameter o. 

Composition op Characteristic Functions 
8. Having n independent variables Xi, . . . Xn whose charac¬ 
teristic functions are v>i(0i v>*(0> • • • ^n(0> product 

»>(0 = • • • (p-(0 
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is the characteristic function of their sum 


B ^ Xx+X% + • • • + Xn. 

In fact, the characteristic function of « is by definition 

Since Xi, X 2 , . . . x« are independent variables, the expectation of the 
product 

c**i**eW • • • 


is equal to the product of the expectations of the factors, whence 

^(0 = ^ i ( 0 ^ 2(0 • • • ^ n ( 0 - 

This simple theorem is of great importance since it determines the 
characteristic function of the sum of independent variables and indirectly 
its function of distribution. 

9. A few examples will illustrate the preceding remark. 

Example 1. Consider n independent normally distributed variables Xi, Xs, . . . Xn 
with means = 0 and standard deviations . . . <r„. Their characteristic func¬ 

tions are 

okH* 

^ e 2 ; k = I, 2j ... n 
and the characteristic function of their sum 


will be 


where 


« = Xi 4" Xa + * • * + *» 


= C 2 


<r* — (Tj -|- <r| -4- • • • 4“ (T*. 

Hence s is a normally distributed variable with the mean 0 and the standard deviation 
(T == \/<rJ 4- vj 4- • • * 4- 0^1 


as we found previously by a method involving a considerable amount of calculation. 

Example 2. Independent variables Xi, Xs, . . . x» have Cauchy’s distributions 
with parameters ai, as, . . . an. Since the characteristic function of Xk is 

the characteristic function of the sum 


will be 
where 


« xi 4- xi 4" • • 4" 

^(e) = e-Mi 

o ai 4" 4* • • 4” <Xf». 


Hence, s again has Cauchy's distribution with the parameter ai 4“ o* 4" • • * 4- 
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Example 8. Let xi, Xtj . . . Zm be independent variables with uniform distribu¬ 
tion of probability in the interval (0, 1). The characteristic function of any one of 
them is 


Uo 




iU 


Hence, the characteristic function of their sum a will be 

The distribution function of « is given by 


>y 

1 f _ l\»1 _ 

=C-f-ilim e ^^ 

2xa-oJ-« \ tlv / w 

and, since the integral again is uniformly convergent, 

1 r * / 


The evaluation of this integral presents certain difficulties. To avoid them we 
notice that the integrand considered as a function of a 

complex variable v is holomorphic everywhere. Hence,-— f^ea/axis 

we can substitute for the rectilinear path of integration 
the path r as shown in Fig. 20. 


o 

Fiq. 20. 


Now it is easy to show that integrating over the path r we have 

p > 0 

if P ^ 0 


The integral 


fe*.. 0 if 

if 
nl 

Jr\ ilz / w 


being a linear combination of integrals of the type fig) with p ^ 0 reduces to 0. 
Similarly, 


or, in explicit form, 


0 




Referring to the above expression of F(0, we find that 


m - c+- *)'• 
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The constant C » 0 since F(t) and the sum in the right member both vanish for 
( B 0. The final expression of f (t) is, therefore: 


^■(0 - 


1 


1.23 • • 

• n_ 




The series in the right member is continued as long as arguments remain positive. 
Such is the probability that the sum 


®i 4- *1 + * • • + 


of n independent variables, uniformly distributed throughout the interval (0, i), will 
be less than t. The above expression is due to Laplace, who, however, obtained it in 
quite a different manner. 


Problems for Solution 
1. Prove directly the inequality 


2 


for absolute moments. 

Hint: The quadratic form in X, m 



i\\x^ + i^\x\h'dipiz) 


is definite or semidefinite. Show that the equality sign cannot hold if ip(z) has at 
least two points of increase a, 0 such that <x:^ is neither 0 nor ±1. 

2. Let Xi, Xs, . . . Xn be n variables. Denoting the absolute moment of the order 
a for Xi by and by «« the quotient 

4- uf*! 4- . . . _L „(»») 
fhA-s ^ fH+i -r * • • -r /*»+« 

«5 ■* - s 

o4‘’ + 4" + • • • + 

prove that 

1 1 

if a' > « > 0. 

Hint: Use Liapounoff's inequality. 

8. A variable is distributed over the interval (0, + «) with a decreasing density of 
probability. Show that in this case moments Mt and Mi satisfy the inequality 

M\ ^ iMi (Gauss) 

and that in general 

1 1 

Km + DMmT ^ l(v + DMyV 

it i» > M > 0. 

Indication of the Proof, Show first that the existence of the integral 


^ *x-/(x)dx 

in eoBe fix) is a positive and decreasing function implies the existence of the limit 
lim — 0; o —► 4- «o. 
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Hence, deduce that 


^J'xdipix) =* 1, = (m 4- 1)Mm, J^*x‘'^^d^(x) ^ (y-h l)Ai, 


where ^(x) =* /(O) — /(x) and, finally, apply the inequality 

4. Using the composition formula (1), page 269, prove Laplace’s formula on 
page 278 by mathematical induction. 

5. Prove that the distribution function of probability for a variable whose charac¬ 
teristic function ^(i) is given can be determined by the formula 


F(t) = C + lim 
k 


■".if 

,«0 2irJ- 


ip{v) 1 — 


• 1 4- hhf^ 


dv. 


Hint: In carrying out Liapounofif’s idea, take an auxiliary variable with the dis¬ 
tribution 



Also make use of the integral 


1 r * e“*^dx 


Many definite integrals can be evaluated using the relation between characteristic 
and distribution functions, as the following example shows. 

6. Let X be distributed over (— «, 4- ) with the density The character¬ 

istic function being in this case 


we find 


whence 


v»(0 = 


1 

1 4-^» 


1 " 1 — 1 r* 

Fit) « C 4- — I - - dv - - I e'‘*'dx, 

^2Tj-«ii;(l +t;«) 2j-. 

irj-«l 4-v* 


-dv — 


an integral due to Laplace. 

7. A variable is said to have Poisson’s distribution if it can have only integral 
values 0, 1, 2, . . . and the probability of x is 


o*e~« 
ifcl' ' 

the quantity a is called ^'parameter” of distribution. If n variables have Poisson’s 
distribution with parameters oi, ai, . . . a», show that their sum has also Poisson’s 
distribution, the parameter of which is at 4- oi 4- • • • 4- a,. 
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8 . Prove the following result: 




1 


2.4*6 • 

* 2n 


(< + n)* 


5(1+ n - 2)* + 


+ 


w(n - 1) 
1 -2 


(« + n - 4)« 



the series being continued as long as arguments remain positive. 

Hint: Consider the sum of n uniformly distributed variables in the interval 
+1) and express its distribution function in two different ways. 

9. Establish the expression for the mathematical expectation of the absolute 
value of the sum of n uniformly distributed variables in the interval ( — Hi +^)* 
Ana. 

+ =^(.-0 -} 


the series being continued as long as the arguments remain positive. 

Hint: Apply Laplace's formula on page 278, conveniently modified, to express the 
expectation of aji + *2 + • • • + ** and that of |«i -f- z* -h • • • + 

10. Show that under the same conditions as in Prob. 0 


J!?|xi -f *1 + • • • 4* x»l 


n r * /sin tV 'sin t — f cos < 

sJ-At,/ - T> *■ 


Hint: Prove and use the following formula 


— 1 — iwx 

- -- - - — (to - -t|w|. 

-T 

11. Let Xi and Xt be two identical and normally distributed variables with the 
mean — 0 and the standard deviation v. If x is defined as the greater of the values 
ixi|, |xt|, that is, 

X - max. (|xi|, |x*|) 
find the mean value of x as well as that of xK Ana. 



18. Let 

X - min. (|xi|, |x,|, . . . |x»|) 

where Xt, Xs, . . . x« are identical normally distributed variables with the mean 0 
and the standard deviation e. Find the mean value of x. Afii. Setting for brevity 


we have 


<rVirJo 


u* 

-sr* 


du 


9(0, 


*(*) -•(Ol-A 
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In particular forn 2 


For large n asymptotically 


E{x) - -^( 1/2 - 1). 


E{x) 




n + r 


IS. A variable with the mean « 0 and the standard deviation 1 is called a 
^‘reduced variable.*' By changing the origin and the unit of measurement any 
variable can be made reduced. For, if x has the mean a and the standard deviation a 
the variable 


u 


X — a 


9 


is reduced. The distribution function of the reduced variable u can be called the 
**reduced law of distribution." 

As wo have seen, variables xt and Xs with normal distribution have the same 
reduced law of distribution, as does their sum. The question may be raised: Is the 
normal law of distribution a unique law possessing this property? (G. P61ya.) 

SoltUion. Let Xi, Xi be two variables for which the second moment of the distri¬ 
bution exists, so that we can speak of their means and standard deviations. Let xi 
have its mean a\ and its standard deviation ai; likewise, let at and at be the mean and 
the standard deviation of Xs. Three reduced variables 


Xi — fli Xj — Ot Xi “b Xj — Oi — Oa 

= . . . 

Ver* -h V* 

have by hypothesis the same law of distribution. Hence, they have the same charac¬ 
teristic function ^(0 whence we can draw the conclusion that the characteristic 
functions of Xi, Xi, xi -b Xj are, respectively, 

^i(0 * v>i(0 *= «**•*¥»(<»■*<); ^i{t) — + a}<). 

Since 


we must have for an arbitrary real t 

V>(<ri05P(<ri<) = tpiy/al + 

or 

(1) ip(at)a>{Pt) - ^(0 

where 


ai ^ at 

^ .---- t 0 — — - . ; 

Since (1) holds for every real f, we shall have 

tp(at) = ip{aH)ip(a0t); ^(0t) - 


and 


a* + - 




1 . 


(2) M - ^(aHMa/9t)Wt). 

Applying (1) again to each of these factors in the right member of (2), we find that 

(8) ♦»(() - v»(a*()^(«*^0V(a/5*0V(/J*0 
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and proceeding in the same way, we arrive at the general formula 
(4) v»(0 " * • • ^O^O*** 

where po, pi, . . . pn are coefficients in the expansion 

(1 -f *)" *• Po + Pl* + * • • + P*«*. 

The arguments 

Vo ** a*‘f» Vi =* ...»«« 

tend uniformly to 0 since a < 1, /9 < 1. The quotient 
tp{v) - 


_J « _ J* (1 - 

onvergent integ 
^ 


is represented by a uniformly convergent integral; hence 
¥>(») - 


PdF{t) - -- 


or 

where 


At the same time 
where again 


¥>(») •“ 1 + [—i + «(»)]»• 

€(») -♦0 as V —► 0. 


log tp{v) = + 3(v)l»* (principal branch of log) 

6(v) —> 0 as V 0. 

Now, taking logarithms of both members of (4) 

log ip{t) « ~ J<*(poa*" + pia***”V* + * • • + Pnj9*") + O ®= — 4” il 

where 

n =* <*[poa(vo)a*" + pi«(Vi)o*"“*i8* + 

Given c > 0, we can take n so large that 

|a(v<)| < «; * « 0,1, , 

| 0 | < €<*. 


+ p»a(v»)i8*"l. 


whence 

Thus 


|log ip{i) + W < rf* 

and since c can be taken arbitrarily small, 

log ^(t) 4- if* = 0 


^(0 




which shows that the normal law is the only one with the required properties, among 
all laws with finite second moments. 
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CHAPTER XIV 

FUNDAMENTAL LIMIT THEOREMS 


1. Bernoulli’s theorem, as we have seen in Chap. VII, follows from a 
more general one known as Laplace’s limit theorem. In terms already 
familiar to us, this theorem can be stated as follows: Let an event E 
occur m times in a series of n independent trials with constant probability 
p. As n becomes infinite, the distribution function of the quotient 

m — np 
y/npq 

approaches 

r 

as a limit; or, to state it in a less precise form, the distribution of the 
above quotient tends to normal. 

Just as Bernoulli’s theorem itself is a very particular case of the general 
law of large numbers, so Laplace’s limit theorem is a special case of 
another extremely general theorem, the discovery of which by Laplace 
may be considered as the crowning achievement of his persistent efforts, 
extending over a period of more than twenty years, to find the approxi¬ 
mate distribution of probability for sums consisting of a great many 
independent components with almost arbitrary distributions. The 
result at which Laplace finally arrived is as astonishing as it is simple: 
if xif X 2 f . . . Xn (E{xi) = 0, i = 1, 2, . . . n) are independent variables 
(subject to some very mild limitations not stated, however, by Laplace) 
and Bn is the dispersion of their sum, then for large n the distribution of 
the quotient 

-f 3^2 ' -h Xn 

y/Wn 

is nearly normal. To put it more precisely, the distribution function 
of this quotient tends to the limit 

e-‘“*du 

as n becomes infinite. 

Laplace’s attempt to prove this important proposition does not stand 
the test of modern rigor and, besides, cannot easily be made rigorous. 

283 
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The same is true of the attempts made by later investigators, notably 
Poisson, Cauchy, and many others. Only after a lapse of many years 
were truly rigorous proofs of Laplace’s theorem given. This important 
achievement is the result of the work of three great Russian mathemati¬ 
cians: Tshebysheff (1887), Markoff (1898), and Liapounoff (1900-1901). 
An account of Tshebysheff’s and Markoff’s ingenious investigations is 
given in Appendix II. Here we shall follow Liapounoff; for his method 
of proof has the advantage of simplicity even compared with more recent 
proofs, of which that given by J. W. Lindeberg deserves special mention.^ 

2. Before going into details of analysis, we shall state the limit theo¬ 
rem in a very general form due to Liapounoff. 

Laplace-Liapounoff’s Theorem. Let xi, zt, Xn he independent 
variables with their means = 0, possessing absohUe moments of the order 
2 + 6 {where 8 is some number > 0): 

Mi+ii Mi?l> • • • Mi+|. 

//, denoting by Bn the dispersion of the sum xi + x* + • • • + 
quotient 

^ _ MiX. + + • • • + Mi?| 

tan --J- 

tends toO as n-^ the probability of the inequality 
Xi + Xf + • • • + Xn ^ . 

VK 

tends uniformly to the limit 

—^ r 

It is natural that the complete proof of a theorem of such character 
cannot be too short, and to make the proof clearer it is advisable to 
divide it into logically separated parts. 

3. The Fundamental Lemma. Let s» be a variable, depending on an 
integer n, with the mean = 0 and the standard deviation = 1. If its 
characteristic function 

ipn{v) = B(c<’*») 

tends to 

e 2 

' Lindeberg’s proof, as well as later proofs by P. Levy and others, make use of an 
ingenious artifice due to Liapounoff. Lindeberg explicitly acknowledges his indebted¬ 
ness to Liapounoff, while Levy and other French writers fail to give due credit to the 
great Russian mathematician. 
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uniformly in any given finite interval ( — the distribution function 

Fn{t) of 8n tends uniformly {in the domain of all real values of t) to the limit 

—^ r e-i'-'du. 

Proof, a. Together with the variable Sn, whose distribution function 
is Fn(t), Liapounoff considers another variable 

= «» “h y 

where y is a normally distributed variable with the distribution function 


G(y) = 


1 r*' 


hy/i, 




e **dx. 


Denoting the distribution function of t» by Hn{t)t we have (Chap. XIII, 
^ec. 7) 

l-a 


( 1 ) 


ff.«) -= ■^j’dF^{x)j * e--du. 
tie inequ 

-i-f“ 

y/ijT 


Vi 

On account of the inequality 


e““‘du g T ^0 


we have; 
For 1 


t - X < 0: r * e—'du = ^(*); 

< - X ^ 0: ^ r * e-“’dtt =1-^ f e-’dw = 1 - ; 


0 < ^ 1 . 


Hence, introducing these expressions into (1), 

ff.(0 = £ ^df’.(x) + YJ «“(^)dF.(x) - 

where again 0 < < 1; 0 < < 1. This leads to the following 

inequality: 

lff,(0 - F,(0| < i dF.(x). 


e 


-(4-0* = 


2>AJ_. 


But 
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and consequently 


|Hn(0 - n(0| < 


= I e ^ 


<Pn{v)dV 


(2) lHn(0 


- F.(0| < + 

J * ) 


Here we split the first integral into three Ji, J 2 , Jzt taken respectively 
between limits ——Ij 1; I, + «> and denote the second integral 

by Ja. Since l^»(t^) — c 2| ^ 2, we shall have 




because 


for positive x. Also 




T'e-?*. 


To estimate ^2 we shall denote by e«(Z) the maximum of l^n(v) — e ^ 1 in 
the interval Then 

A ,r, ^ A«-(0 f* _ 1 m 

Finally, taking into account (2), (3), (4), and (5), we find 

(*/)« 

(6) IffnW - F.(0I < i..(0 + ;^ + :^ T- 

6. Expression (1) of Hn(t) can be transformed in a manner similar 
to that employed in Chap. XIII, Sec. 7, if we first write 


f * e-^‘du = I + r * e-“'du. 

VtJ_. ^ VirJo 
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Thus we get 


1 1 r * T 


-»yi - e-v,(t>) 


dv 


or 


rx 1 . 1 f" -^’-?8into, , 1 C‘e~ * , -'4 

= 2 + iJo ® + 2^ j_ ir-^* - 


Now 


rjo V Tjo 




4*-, 


X- 


VC 


4r 


since 


0 < 1 — 6 ^ 

4 


^nd consequently 


(7) |h.(.) -1 - 

To find an upper bound of the integral in the right member, we split 
it into five integrals h, /i, h, I a, I a taken respectively between limits 
— 00, —I; ——X; —X, X; X, 1 ; I, +oo. To estimate /|, we notice 
that 

k.(«') - i| ^ “ ? 


e * - 1 g 


and 

Hence 

( 8 ) 


kn(v) — C 2 | ^ t;*. 


To estimate It + It, we use the inequality |^»(») — e *| ^ *,(I) and we 
get 

(I) 


(9) 




y/icKK 


Finally, dealing with I\ and Js, we use the obvious inequality 


\iPn(v) - e *1^2 
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and we obtain 


i''+'‘I s u 


" -^‘dv ^ 4e~~ 
* » ^ T (hiy 


Taking into account (7), (8), (9), and (10), the following inequality 
results: 




-I^ sin to 


„ + ^ + jM. + * 

" ^ 4r ^ 2x + ^ IT (hiy 


In it, since X is still at our disposal, we can take 

X = €n(0*A-*. 

The inequality thus obtained when combined with (6) gives (a = hi) 


r / 4 \ 1 if" “?sin tv,\ ^ ^ , 2 e * , a , 

(11) F.{C, - 2 - « --^.1 < + :^ + 

(s + 2*"®' 

Here a and I are arbitrary positive numbers. We dispose of them in 
the following manner: Given an arbitrary positive number e, we take a 
so large as to have 

^o* _a^ 

4c * 2 c * 1 

IT « Vi a 3 

and after that we select I large enough to make 

a . ^ 1 

VSl ^^ 3"* 

Finally, since for a fixed Z, €n(Z) by hyoothesis, tends to 0 when n oo, 

there exists a number n© such that 


(s + + i-ro < r 

for ail n > n©. The inequality (11) then shows that 
L 1 if* sin tVj ^ 


for n > no and this means that 


n /.V 1 , 1 f" -r 1 r -3“*J 
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uniformly in i because the number no, as clearly follows from the pre¬ 
ceding analysis, depends upon € only and not upon t. 

Remark 1. Without changing anything in the proof, we can state 
the fundamental lemma in a slightly generalized form as follows: If in 
tends to the limit t, the probability of the inequality 

Sn ^ tn 

tends to 



Remark 2. The fundamental lemma, although not explicitly stated 
by Liapounoff, is implicitly contained in his proof. More general 
propositions of the same nature have been published by P6lya and L^vy. 
The very elegant result due to the latter can be stated as follows: If 
the characteristic function of the variable 5„ tends to the characteristic function 

4,(t) = J“y‘-dF{x) 

of a fixed distribution uniformly in any finite intervalf then 

lim Fn(t) = F(Jt) 

at any point of continuity of F{t). 

The above proof, corresponding to the particular case 

F{t) = -4= f e-io’du, 

can be used, almost without any changes, in proving the general proposi¬ 
tion of L6vy. 

4. Proof of LiapounofiPs Theorem, a. If Liapounoff’s condition 

is satisfied for a certain $ > 0, it will be satisfied for all smaller S, 

Let fi(t) be the distribution function of Xi{i = 1, 2, . . . n). The 
sum 

/(0=/l(0+/2(0+ • ' • +/n(0 

being a nondecreasing function of f, the following inequality holds 
(Chap. XIII, Sec. 5): 
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provided o > 6 > c > 0. We take here 

0 = 2 +«, 6 = 2 +c = 2 

supposing 0 < J' < i. Then 

= -B. 




But this inequality is equivalent to 




and it shows that 




~ 1 a 


_1-- —y Q 


-L—^-^0, 

BI-^2 

provided 0 < 5' < 3. Hence, in the proof we can assume that the funda¬ 
mental condition is satisfied for some positive 5^1. 

6. Liapounoff's inequality (Chap. XIII, Sec. 5) with c == 0, 6 = 2, 
a = 2 + 5 when applied to Xi gives 

g (M^uy; h, = E{x}). 

Hence, 


bi ^ /Mi^a 


/^V 


< &>?+* 


and, since it is assumed that a, —* 0, all the quotients 

T, “ b, + 6, + • • • + 6, (t = 1,2, . . . n) 
will converge to 0 uniformly as n —> <». 
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c. The following formula can easily be obtained by means of integra¬ 
tion by parts: 


e** = 1 -f* ix 


-r 


(e'*‘ - 1)(1 - t)dl. 


If X is real and in absolute value >2, we have 




X* - 1)(1 - t)dt\ 


^ X* < 


2« 


since 


- 1| g 2. 

If |x| ^ 2, we can use the inequality 


le**' - 1| ^ 2 




• and find 


l|x| 


2+a 


^ ' 'ZJ— < ,__ 

“ 3 2* ^ 2* 


Thus, for every real x 

X* lxl*+« 

Substituting here 

X = = <f* 

and taking the mathematical expectation of both members, we have 
(13) vkit) = = 1 - 19*1 g 1. 


1 +- 

2 * 5 , 2 


Furthermore, since 


1 - X = e-* - |x’; X > 0; 0 < ^ < 1, 


we can write 
(14) 

If < 1, we shall have, by virtue of (12), 




2\2bJ 
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and consequently 



This inequality! together with (13) and (14), leads to the following 
expression of ^*(0; 

(16) = e (1 -f ck) 

where 

(16) hi < le* < 3^.|ir*. 

O 1+- 14-- 

d. The characteristic function of the variable 


^ _ X\ X% -\r * * • + iCn 

Vb, 

is 


^(0 = • * • ^ n (0 

because «i, Xn are independent variables. Hence, by (16) 

ip{f) = e”*‘*(l + <ri)(l + <^ 2 ) • * * (1 + <^n) 
W0-c“*‘1<(l-f I<r,|)(l + W) • • • (l + k|)-l<eM+l-»+-'+i-»-l 


and 

(17) 1^(0 - - 1 

taking into account inequalities (16). Inequality (17) holds if 

< 1 . 

Suppose, now, that t is confined to an arbitrary finite interval 




Because «*, by hypothesis, tends to 0, the difference 

- 1 


will tend to 0 as n —♦ 00 . In connection with (17) this shows that 

uniformly in any finite interval. It suffices now to invoke the funda¬ 
mental lemma to complete the proof of Liapounoff’s theorem. 

6. Particular Cases. This theorem is extremely general and it is 
hardly possible to find cases of any practical importance to which it 
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could not be applied. Two particularly significant cases deserve special 
mention. 

First Case. Let us suppose that variables Xi, . . . Xn are bounded, 
so that any possible value of any one of them is absolutely less than a 
constant C. Evidently 


^ C^Eixl) 

and hence 


^ -j- 

Bl 

It suffices to assume that 


C^hi 


Bn = bl 4 - 62 “h • * 4 " bn 


tends to infinity to be sure that ojn 0. Hence, dealing with bounded 
independent variables, the condition for the validity of the limit theorem 
is 

—> 00 as n —> 00 , 


which is equivalent to the statement that the series 


bl 4" b2 + bg + • • • 


is divergent. 

Poisson’s series of trials affords a good illustration of this case. In 
the usual way, we attach to each of the trials a variable which assumes 
two values, 1 and 0, according as an event E occurs or fails in that trial. 
Let Pi and = 1 — p* be the respective probabilities of the occurrence 
and failure of E in the tth trial. The variable Zi attached to this trial 
is defined by 

Zi — Wi E occurs, 

2 * = 0 if ^ fails. 


Noticing that 


E{z,) = Vi, 


we introduce new variables 


Xi = Zi — Pi (i = 1, 2, . . . n) 
with the mean 0, whose sum is given by 


m — np 

where m is the number of occurrences of in n trials and p the mean 
probability 

^ Pi + Pt + ' ' ' + Pn 


n 
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In our case 

£(**) = P.V< 
and 

n 

Bn = 

Hence, we can formulate the following theorem: 
Theorem. The probability of the inequality 

m — np < ty/Bn 

tends uniformly to the limit 

1 r* ~ 

— 7 = I e 2 du 

V^J-. 

as n—* provided the series 


%P<9i 


is divergentn At the same time the probability of the inequalities 
h^/Bn < m — np < Uy/Wn 
tends uniformly {in < 2 ) to the limit 




h tt* 

2du. 


V^, 

Second Case. Let zi, Z 2 , . . . Zn be identical variables with the 
common mean a and dispersion 6. Supposing that for some positive 8 


E\zi — = c 


exists, we have 


«n = 




n 2 ) 


and hence wn 0 as n —> «. The limit theorem applied to this case 
can be stated as follows: 

The probability of the inequality 

Zi + Z 2 + ' ‘ * 4" Zn — na < ty/^ 

tends uniformly to 

1 n 
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provided 


E\zi — 


exists for some positive 5. As a corollary we have: The probability of the 
inequalities 


tends to 


-iJ- < 

\n 


+ zt + 


+ Zt 




_2 

V^, 


X t _ h ! 
e ^du. 


This proposition is regarded as justification of the ordinary procedure 
of taking a mean of several observed measurements of the same quantity, 
made under the same conditions, to approximate its ‘Hrue value.” 
Barring systematical errors which should be eliminated by a careful 
^tudy of the tools used for measurements, the true value of the unknown 
quantity is regarded as coinciding with the expectation of a set of poten¬ 
tially possible values each having a certain probability of materializing 
in actual measurement. Since for comparatively small t the above 
integral comes very near to 1 and 


■4 


for large n becomes as small as we please, the probability of the mean of a 
very large number of observations deviating very little from the true 
value of the quantity to be measured, will be close to 1 and herein lies 
the justification of the rule of mean mentioned above. 


Estimation op the Error Term 
6. The limit theorem is a proposition of an essentially asymptotic 
character. It states merely that the distribution function Fn(t) of the 
variable 

Sn 

approaches the limit 


as n becomes infinite when a certain condition is fulfilled. For practical 
purposes it is very important to estimate the error committed by replac¬ 
ing F„(t) by its limit when n is a finite but very large number. In his 
original paper Liapounoff had this important problem in his mind and 
for that reason entered into more detailed elaboration of various parts 


+ X2 "h 


+ Xn 


■\/K 


1 r* -- 

-4= e ’>du 

V^J-. 
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of his proof than was strictly necessary to establish an asymptotic 
theorem. 

We do not intend to reproduce here this part of Liapounoff’s investiga¬ 
tion; it suffices to indicate the final result. Assuming the existence of 
absolute moments of the third order E\xi\^; z = 1, 2, . . . n, we shall 
suppose n so large that 

_ + - • + .1 
gH ^ 20 ’ 

Then, setting 

we shall have 

Ifll < |-n[(l0g + l.l] + log gL + . 

Although this limit for the error term is probably too high, it seems 
to be the best available. However, it is greatly desirable to have a more 
genuine estimation of R. 

7. Hypothesis of Elementary Errors. It is considered as an experi¬ 
mental fact that accidental errors of observations (or measurements) 
follow closely the law of normal distribution. In the sphere of biology, 
similar phenomena have been observed as to the size of the bodies and 
various organs of living organisms. What can be suggested as an 
explanation of these observed facts? In regard to errors of observations, 
Laplace proposed a hypothesis which may sound plausible. He considers 
the total error as a sum of numerous very small elementary errors due 
to independent causes. 

It can hardly be doubted that various independent or nearly inde¬ 
pendent causes contribute to the total error. In astronomical observa¬ 
tions, for instance, slight changes in the temperature, irregular currents 
of air, vibrations of buildings, and even the state of the organs of percep¬ 
tion of an observer may be considered as but a small part of such causes. 
One can easily understand that the growth of the organs of living organ¬ 
isms is also dependent on many factors of accidental character which 
independently tend to increase or decrease the size of the organs. If, 
on the ground of such evidence, we accept Laplace^s hypothesis, we can 
try the explanation of the normal law of distribution on the basis of the 
general theorems established above. 

Suppose that elementary errors do not exceed in absolute value 
certain number Z, very small compared with the standard deviation 
of their sum. The quantity denoted by Wn in the preceding section w 
be less than the ratio l/<r and hence will be a small number; and the sar 
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will be true of the error term R. Hence, the distribution of the total 
error will be nearly normal. 

Laplace^s explanation of the observed prevalence of normal distribu¬ 
tions may be accepted as plausible, at least. But the question may be 
raised whether elementary errors are small enough and numerous enough 
to make the difference between the true distribution function of the total 
error and that of a normal distribution small. Besides, Laplace’s 
hypothesis is based on the principle of superposition of small effects and 
thus introduces another assumption of an arbitrary character. 

Finally, the experimental data quoted in support of the normal dis¬ 
tribution of errors of observations and biological measurements are not 
numerous enough for one to place full confidence in them. Hence, the 
widely accepted statistical theories based on the normal law of distribu¬ 
tion cannot be fully relied on and may be considered merely as substitutes 
for more accurate knowledge which we do not yet possess in dealing with 
problems of vital importance in the sphere of human activities. 

Limit Theorems for Dependent Variables 

8 . The fundamental limit theorem can be extended to sums of depend¬ 
ent variables as, under special assumptions, was shown first by Markoff 
and later by S. Bernstein, whose work may be considered an outstanding 
recent contribution to the theory of probability. However, the condi¬ 
tions for the validity of the theorems established by Bernstein are rather 
complicated, and the whole subject seems to lack ultimate simplicity. 
For that reason we confine ourselves here to a few special cases. 

Example 1. Let us consider a simple chain in which probabilities for an event E 
to occur in any trial are p' and p", respectively, according as E occurred or failed in 
the preceding trial. The probability for E to occur at the nth trial when the results of 
other trials are unknown is 

p« = p + (pi - p)6”"^ 

where v\ is the initial probability, b = p' — p” and 


P 


1 - 6 


The mean probability for n trials is given by 


Pn 


Pi — p 1 - 

^ n 1-5 


«K) that p may be considered as the mean probability in infinitely many trials. 

In the usual way, to trials 1, 2, 3, . . . we attach variables Xi, X 2 , Xs, . . . so that 
in general 


= -Pi 


x< = 1 — p< 


or 


Xi 
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according aa E occurs or fails in the tth trial. If m is the number of occurrences of 
X in n trials, the sum 

afi + *1 + • * • + 
of dependent variables represents 


Evidently 


m — np*. 
E{m — np«) « 0 


and, as we have seen in Chap. XI, Sec. 7, 


B, « B(m - nPnY ' 


that is, the ratio of B»: npq 


1 -f 3 
1 - h 


tends to 1 as n becomes infinite. 


In order to find an appropriate expression of the characteristic function of the 
quotient 


m - nPn 

Vb. 


we shall endeavor first to find the generating function un(t) for probabilities 

« 0, 1, 2, ... n) 

to have exactly m occurrences of B in n trials. Let Am,n be the probability of m 
occurrences when the whole series ends with E and similarly Bm,n the probability of 
m occurrences when this series ends with F, the event opposite to E. The following 
relations follow immediately from the definition of a chain 


(18) 

Let 


•ilm.n+l ^ Am—\,nP* "I" Bfn—l,np** 

Bm,n+1 ** Am,nQ^ *4“ 


®*(0 Am,il^t ^n(t) — Bm.nt'”* 

m—0 m—O 


be the generating function of Am.n and B»,n. From relations (18) it follows that 


- pHBnit) 4 - p"tMi) 

4'n+l{t) = q'Bnit) + 


These relations established for n ^ 1 will hold even for n 
MO by 

p% + p'Vo = Pi 
5'tfo + * 1 — Pi 

whence 


^0 4" — 1. 


0 if we define Bo(t) and 


From (10) one can easily conclude that both Bn(0 and MO satisfy the same equa¬ 
tion in finite differences of the second order 


^•4.1 - (P't 4- q'O^n+i 4- - 0 

'Mi *" (p^f 4" 9")^n+l 4- ** 0. 
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Evidently 

hence 


Pm,n 

<On{t) ■ 


' Am.n Em.nt 

end) 4- rf^nd) 


satisfies the equation 

(20) w«+* — (pH 4- 9")«»+i 4- s* 0 

and is completely determined by it and the initial conditions 

tfo *■ 1, «i = 9i 4“ Pit. 

Since 

p' *= p 4- g" * 9 4- p« 

the characteristic equation corresponding to (20) can be written 

(f - I)(r - a) = « - i)[(p + q6)t - ai 

and for small < — 1 its roots can be expanded into power series 

fi =» 1 4- cid — 1) 4- Ct(i — 1)* 4” • • • 
r* = a 4“ di(i — 1) 4" dtd ■“ 1)* 4" * * * . 

The general expression of wn(0 will be 

«n(o = ArT + = Ar? + Bs-nr 

where to satisfy the initial conditions we must take 


- gi - Pit, 


B 


r* “fi 

Having found <and)i the characteristic function of 

m - nPn 


•fi 4- gi 4- Pi< ^ 
f* - ri 


will be given by 


VK 


Vn(v) 


-nPn 


vi £ . V \ 


To study the asymptotic behavior of ipn(v) when v is confined to a finite fixed 
interval —i ^ ^ 1, we notice that then 

V 

U = —7= 

VBn 

will be well within the convergence region of the series we are going to consider now. 
By means of Lagrange’s series or otherwise, we find the following expansion of log h in 
power series of ( — 1 

log ri - p(t -1) - (f - f^)(‘ - D* + • • • 

xmvergent for sufficiently small values of < — 1. By setting < = we obtain another 
x)wer series in u 

pq 1 4- g 


log f 1 *= piu - 


2 1 - a 


,u*4- 
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convergent for sufficiently small u. Hence 

nput-npffr— -■^+nu*g(u) 

rt-o 

where g{u) is a bounded function of w, u being contained in a certain interval (—r, r). 
By substituting 


here, we easily conclude that 


tends uniformly to the limit 


in the interval —I ^ v ^ I while 


remains there uniformly bounded. Since, as can easily be seen, A and B can be 
represented by power series 

il * 1 + Oitt + oiw* -f- • • • 

B = —aiu — oitt* — • • • 

A tends uniformly to 1 and B tends uniformly to 0. Hence, finally, in any fixed 

interval —l^v ^ I tends uniformly to e It suffices to apply the fundamental 
lemma to conclude that the probability of the inequality 


V^. 


-nfr 




"2 


— nPn 


y/Bnf- 


tends uniformly to the limit 


if in tends to i. 


- np. < tnVK 




1 + a 

Since Bn is asymptotic to npq- -and Pn differs from p by a quantity of the order 

1 ” 6 


l/n, the inequality 


can be written in the form 


, /l±j 
'V— 


m — np <t 


m - nPn < tnVBn 


with in tending to whence, using the above established result, the following theorem 
due to Markoff can be derived: 
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Theorem. For a simple chain the probability of the inequalities 


‘•/i 


+ S 


■npq < m — np < ti. 




jnpg 


tends to the limit 


1 f‘- 


2dw 


as n 00. 

Example 2. Considering an indefinite series of Bernoullian trials with the prob¬ 
ability p for an event A to occur, we can regard pairs of consecutive trials 1 and 2, 
2 and 3, 3 and 4, and so on, as forming a new series of trials which may produce an 
event E consisting of two successive occurrences of A {E = or an event F opposite 
to E (F — A B, BA, BB). With respect to E the trials of the new series are no longer 
independent. Let m be the number of occurrences of in n trials. Then 


E{m — np*) =: 0 

and 

Bn - Eim — np*)* = np*g(l 4- 3p)2p*g 
as was shown in Chap. XI, Sec. 6. 

Let Pm.n be the probability of exactly m occurrences of E in a series of n trials. 
Evidently 

Pm.n = ^m.n + Pm.n 

where Am.n and Pm.n are the probabilities of m occurrences of E when the Bernoullian 
series of n -f 1 trials ends with A or P, respectively. By an easy application of the 
theorems of total and compound probabilities we get 


Aw.n+I == Am-l.nP + Pm.nP 
Pm.n+1 = Am.«<Z + P«.n«. 


Corresponding to these relations the generating functions 


m-O m-0 


satisfy the following equations in finite differences: 

On+l — ptdn + P^n 

^n+1 = qBn + ?^n 

holding even for n = 0 if we set = Pi Hence, it follows that ^n(0 and 

^n{t) satisfy the same equations of the second order 

- (pt + + pqit - 1 )^- = 0 

\^n+2 — (p< + + P?(f — l)^n — 0 


«.(0 = ».(<) + Ml) = 2 ) 

m — O 


and so does their sum 
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Thus, to detennine ci»»(0 we have the equation 

Um+t — (pt + p?(< — !)«• “ 0 

and the initial conditions 


wo ™ 1» «i *■ 1 -- p* + pH* 

The general expression of w«(0 is 

- An + = ilit + Bp*5-« - D-fT* 

where Ti and It are roots of the equation 


n - f - p« - l)(r - q) 


and 


-n +1 + p«« -1) . 

h-It 


fi -1 - pHt -1) 
h-It 


If Ii is the root which for < « 1 reduces to 1, we easily find the following series 


log fi - pKt - 1) + ?!L_£LLM(, _ i)t + 


or, setting t » and supposing u sufficiently small, 


log h = tp*u — 


p*9(l + 3p) 


»• + 


As to and B, they can be developed into series of the fonn 

A = 1 + cu« + • • • 

B *" —ctt* -f . . . 


Hence reasoning in the same manner as in Example 1, we can conclude that the 
characteristic function 

%v 

pn(v) « e VKcjn(eVX) 

of the variable 


m — np* 


tends to the limit e ^uniformly in any finite and fixed interval —I ^ v ^ I, Refer¬ 
ring, finally, to the fundamental lemma, we reach the following conclusion: The 
probability of the inequalities 


< 1 np*q(l + 3p) < m — np* < «*\/ np*q(l + 3p) 
tends uniformly (with respect to ti and it) to the limit 


1 


r tt 




as n 
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Problems for Solution 

1. Consider a series of independent variables xi, Xt, Xs, . . . where in general 
X* (A; = 1, 2, 3, . . . ) can have only two values A:" and —fc* each with the probability 
}^. Show that the limit theorem holds for the variables thus defined if a > — 
but the law of large numbers holds only if a < H- 
Solvium, Evidently 

E{xk) - 0, E(xti - k'o, 

From Euler’s formula (Appendix I) we derive two asymptotic expressions 


Hence 


Bn 


1*« + 4- • • • + nto 


nto+i 
2a 4 I 


4 2>« 4 . . . 4 n*« 


n»«-n 

3a 41* 


(2a + 1)« 


so that the limit theorem holds. For a ^ the probability of the inequalities 


-€ < 


Xi 4 a?! 4 • • • 4 Xn 
n 


< f 


tends to the limit 









and the law of large numbers does not hold. 

2. Let mi be the number of successes in i Bemoullian trials with the probability p- 
Show that the limit theorem holds for variables 


mi — ap 

* 


i “ 1, 2, 


71 


but the law of large numbers docs not hold (Bernstein). 
Hint: 


H-7^ 4 • • • H- J=^n I 

V n/ vn J 

where Xi, Xi, . . . x« are independent variables with two values q and — p associated 
in the customary way with trials 1, 2, ... n. 

3. Consider an infinite sequence of independent variables Xi, x*, Xi, . . . where 
Xk can have three values 


«i 4 sa 4 • • • 4 «n = (pg) * 




0, (log A;)^ -(log A;)** 
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with the corresponding probabilities 


{k + CL) {log {k + a)j^’ {k + cl) Hog {k + a)j^’ {k + a) (log {k + a))^ 

CL being a sufficiently large constant. Moreover, n and p satisfy the inequality 

2m - p -b 1 > 0. 

Show (a) that Liapounoff's condition is satisfied when p < 1 and hence the limit 
theorem holds; (6) that this condition is not satisfied if p ^ 1 and at the same time the 
limit theorem fails at least for p > 1. 

Solution, a. By using Euler’s formula we find 




Hence the first part is answered. 

6. The probability of the inequality 


is less than 


all 4- 3Ja + • • • 4 Xf» ^ 0 


i(k -ha) (log (k 4- a)|»» 


and this, in case p > 1, is less than 


--(log a)i 

p - 1 

Hence, the probability of the equality 

Xi 4- »2 4* • * • 4 *»i = 0 


remains always >1-- (log a)^~^ and the limit theorem cannot hold. Note 

p — 1 

that because 2 m — p 4 1 >0. 

4. Prove the asymptotic formula 


14n+—4- 


• + 


1 -2 


1 



n being a large integer. 

Hint: Apply Liapounoff's theorem to n variables distributed according to Poisson's 
law with parameter 1. 

6. By resorting to the fundamental Icinina, prove the following theorem due to 
Markoff: If for a variable «» with the mean = 0 and the standard deviation «= 1 

lim E(sJ) = / I 

n-» CO -y/2w J _ . 
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for any given ^ > 3 , 4 , 5 , . . . ^ then the probability of the inequality 8n < t tends 
to the limit 

J - - 

6 . In many special cases the limit of the error term can be considerably lower than 
that given in Sec. 6 . For instance^ if variables xi, Xt, . . . Xn are identical and uni¬ 
formly distributed in the interval the probability Fn(i) of the inequality 


differs from 


+ 


+ 


< r 


■4 


n 

I2 



_u* 

fc ^du 


by less (in absolute value) than 

i+!{?)• 

7-5n ’r\ir/ ir’n 

the last two terms being completely negligible for somewhat large n. 
Indication of the Proof. First establish the inequalities 


sin (p 
<P 



> e"6"l35 


for 0 ^ ^ *'/2. Further, represent Fnit) by the integral 


Fn{t) 



and split it into two integrals taken between 0 and T\/nf\/T2 and v\/nJ\/\2 and 

7. Supposing again that X\, x^, . . . Xn are identical and uniformly distributed in 

the interval — pro /e that for n ^ 2 

E\x\ -h la + • • • + *nl = \ -—1- 7 =^; 0 < 0 < 1. 

60 Vn 

8 . Let 8n be a variable with the mean = 0 and standard deviation —1. If its 

characteristic function v»n (0 tends to as n -+ 00 uniformly in any finite interval 

show that 

I 2 2 C - •Pnjt) 


Hint: 
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9. If independent variables Xi, x>, . . . x» with means -0 satisfy Liapounoff’s 
condition, prove that 


B\xi + + • * * + *f»| 



10. Show that for a simple chain of trials 


- s' 

p being the mean probability in infinite series of trials and 5 = p' — p". 

11. A series of dependent trials can be illustrated by the following um scheme: 
Two urns, 1 and 2 , contain white and black balls in such proportions that the prob¬ 
ability of drawing a white ball from 1 is p, whereas the probability of drawing a 
white ball from 2 is 9 = 1 — p. Whenever a ball taken from an um is white, the 
next ball is taken from the same um, but if it is black, the next ball is drawn from the 
other urn. The urn at the first drawing is selected by lot, the probabilities of select¬ 
ing the first or the second um being given. Evidently the course of trials is deter¬ 
mined by these rules without any ambiguity. Let m denote the number of white balls 
obtained in n drawings and let 


a = p* -f q*. 

Show that the probability of the inequality 
tn — na < i\/La{l — a)n; 


L = 


2(1 - 3pg) 
1 - 2pq 


approaches the limit 


Indication of the Proof. Let 


f' 

y/TirJ- ' 


*du. 


p(l) . p{*) . p(t) . p'4) 

* m,fii * m.n/ ■* m ,nf * m,i» 

be the probabilities of having m white balls in n trials when (a) the last ball is white 
and from um 1 ; {h) the last ball is white and from urn 2 ; (c) the last ball is black and 
from urn 1 ; and (d) the last ball is black and from um 2 . The sum 

p™.. - pL‘!. + pL’!. + + fi*,*. 

represents the probability of having exactly m white balls in n trials. The generating 
functions of probabilities satisfy the following equations 


^n+l 


... 




whence it can be shown that they all, as well as their sum—the generating function of 
Pm,n —satisfy the same equation of the second order 

Zn+t — Un+l + pq{t* — l)*ii * 0. 
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Setting t — one of the characteristic roots will be given by 

(l-2pg)iu-4pfl(l-3p9)^* + • • • 

e ^ 

for small w, while the other root tends to 0 as u —> 0. The final conclusion can now 
be reached in the same way as in Examples 1 and 2, pages 297 and 301. 
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CHAPTER XV 


NORMAL DISTRIBUTION IN TWO DIMENSIONS. LIMIT 
THEOREM FOR SUMS OF INDEPENDENT VECTORS. 
ORIGIN OF NORMAL CORRELATION 

1. The concept of normal distribution can easily be extended to two 
and more variables. Since the extension to more than two variables 
does not involve new ideas, we shall confine ourselves to the case of 
two-dimensional normal distribution. 

Two variables, x, y, are said to be normally distributed if for them 
the density of probability has the form 

where 

ip = ax^ -f 2hxy -h cy^ + *Mx + 2ey -f- / 

is a quadratic function of a:, y becoming positive and infinitely large 
together with \x\ + \y\. This requirement is fulfilled if, and only if, 

ax^ + 2hxy -f- cy* 

is a positive quadratic form. The necessary and sufficient conditions 
for this are: 

a > 0; ac — 6* == A > 0. 

Since A > 0 (even a milder requirement A 0 suffices), constants Xo, yo 
can be found so that 

^ = a(x - xqY + 26(x - xo)(y - j/o) + c(y - yo)* + g 

identically in x, y. It follows that the density of probability may be 
presented thus: 

g—^ o(a^-*p)«-26^x—xp)(y-yo)—e(y-yo)* 

The expression in the right member depends on six parameters K\ 
a, 6, c; Xo, yo. But the requirement 




reduces the number of independent parameters to five. We can take 
a, 6, c; Xo, yo for independent parameters and determine K by the condition 




Sec. 2] 


NORMAL DISTRIBUTION IN TWO DIMENSIONS 


309 


which, by introducing new variables 


i = z - xo, V = y - yo 


can be exhibited thus 




To evaluate this and similar double integrals we observe that the positive 
quadratic form 

ae + 

can be presented in infinitely many ways as a sum of two squares 
a{* + 4- 4- (7f 4- 5 t;)2, 


whence 


a = a* 4- 7*; c = 4- 5^; 6 = a/3 4“ 7^ 


(a5 - fiyy = A. 

By changing the signs of a and /3 if necessary, we can always suppose 


Now we take 


a5 — /37 = 4 "'n/A* 


W = a{ 4- t; = 7f 4- 


for new variables of integration. Since the Jacobian of w, v with respect 
to {, rj is the Jacobian of rf with respect to w, v will be 1 /\/a and, 
by the known rules 

f f J €r'^'~^*dudv = 


= 1, K = 

-\/a » 

That is, the general expression for the density of probability in two- 
dimensional normal distribution is 


^ac - 6* 


g--o (»— «q) *— 26( X— Xq) (u—Vo)—c(|^-l/i) 


2. Parameters Zo, yo represent the mean values of variables x, y. 
To prove this, let us consider 

E(x - xo) = ^ J*" J* (® - 
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To evaluate the double integral, we can express x and y through new 
variables u, v introduced in the preceding section. We have 


and 


X — Xo = 


5u — Pv 

~vr’ 


-yu + at; 
\/A 


whence 


E(x — Xo) = 


r\/A 



(5ti — 0v)€r'*'-'>*dttdv 


and similarly 


E(x) = Xo, 


Eiy) = 2 / 0 . 


0 , 


3. Having found the meaning of xo, yo we may consider instead of x, y, 
variables x -- Xo, y — yo whose mean values = 0. Denoting these new 
variables by x, y again the expression of the density of probability for 
X, y will be; 

T 


It contains only three parameters, a, h, c. To find the intrinsic meaning 
of a, 6, c let us consider the mathematical expectation of {x + Xy)* 
where X is an arbitrary constant. We have 


E{x + \yy = J* + Xy)*e— 

or, introducing u, v defined as in Sec. 1 as new variables of integration, 

E(,x + ^ J "ks - Xt) V + 2(« - XtX-U + Xa)tt» + 

+ (fi — Xa)V]e-"“*“''yiidf; = 


[(5 — Xy)* + (fi — = 


+ X‘^^ 


+ «* 


2A 


2A 


2A 


But 

5 * + jS* — c, 7* + a* = o, + 76 = 6, 

whence 

E{x*) + 2\E{xy) + X«£(y*) = ^ " 2)^ + 
and since X is arbitrary 

«(»•) = E{xy) - EW - 
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On the other hand, if (Ti, (r 2 , and r are respectively standard deviations 
of X, y and their correlation coefficient, we have 


Hence 

and 

or 

Finally, 
a = 


E(x^) = aj, E{zy) = E(y^) = 

h 


2A 


2A 




2A 


—r<Ti(r2 


= Tfald - r*) 


2A = 


2<rf(ri(l ~ r*) 


2crJ(l - r^y 


h = 

Va = 


’2(ri<r,(l - r*)^ 
1 

2 (ri<r 2 \/l — r* 


.c = 


2(r*(l - r*) 


With these values for o, 6, c, and \/A the density of probability can 
be presented as follows: 


2 ir<ri<r 2 \/i — r* 


and the probability for a point x, y to belong to a given domain D will be 
expressed by the double integral 


2ir(ri<r2\/r 

extended over D. 

4. Curves 




I 2(l-r»)[C,) ]dxdy 


1 [(xV ^Or^y . 

2(l~r*)LV^i/ 


= i = const. 


are evidently similar and similarly placed ellipses with the common 
center at the origin. For obvious reasons they are caUed ellipses of 
equal probability. The area of an ellipse corresponding to a given value 
of I (ellipse 1) is 


= 2firlffi<rty/l — r*, 

VA 
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whence the area of an infinitesimal ring between ellipses I and i + di 
has the expression 

2ir<ri<r2\/l ■— r^dl. 


The infinitesimal probability for a point x, y to lie in that ring is 
expressed by 


e-^dL 


Finally, by integrating this expression between limits h and I 2 > hy we 
find 

— C"** 

as the expression of the probability for x, y to belong to the ring between 
two ellipses h and U. If Zi = 0 and U — Z, 

1 - 


gives the probability for x, y to belong to the ellipse Z. 

If n numbers Z, Zi, Z2, . . . Z«_i are determined by the conditions 

1 — = €”•* — C"** = =•••=: 

n + 1 

the whole plane is divided into n + 1 regions of equal probability: 
namely, the interior of the ellipse Z, rings between Z, Zi; Zi, Z2; . . . ln-2, In-i 
and, finally, part of the plane outside of the ellipse Zn-i. 

6. To find the distribution function of the variable x (without any 
regard to y), we must take for D the domain 


— oo<x<Z; —oo<y<+oo. 

As the integral 

1 *0^— • 


2ir(ri<r2\/1 


1 rt _eL n * -** 1 _*L 

-- e ^^'dx • e 2(i-r*)dz = —f_ ( e ^^'dxy 


we see that the probability of the inequality 

X < t 

is expressed by 

X* 




Similarly, the probability of the inequality 


y < t 
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is 

Thus, if two variables a:, y are normally distributed with their 
means = 0, each one of them taken separately has a normal distribution 
of probability with the common mean 0 and the respective standard 
deviations <ri and a 2 . Variables x and y are not independent except when 
r = 0. For if they were independent the probability of the point 
X, y belonging to an infinitesimal rectangle 


would be 


whereas it is 


t < X < t dt; T < y < T di 


1 _ t* _ 

e ^**dtdT, 

/iraio’2 


2T<ri<r2y/l — r* 




dtdTf 


and these expressions are different unless r == 0. Thus, except for r = 0, 
normally distributed variables are necessarily dependent in the sense 
of the theory of probability. Dependent variables are often called 
“correlated variables.“ In particular, variables are said to be in “normal 
correlation^ when they are normally distributed. 

6. The probability of simultaneous inequalities 


X <x <X', 

is represented by the repeated integral 


< t 


27r<riora\/1 






dy 


while 


^j_r 


_eL 

e ^'*dx 


is the probability that x will be contained between X and X\ 
(Chap. XII, Sec. 10) the ratio 


*• p 1 r 1 * 

_ X e 


dy 


“ r^) 


J *X' _ 

X e 


*dx 


can be considered as the probability of the inequality 

y<t 


Hence 
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it being known that x is contained between X and X\ Considering X^ as 
variable and converging to X the above ratio evidently tends to the 
limit 


oty/l — « 






dy 


which can be considered as the distribution function of y when x has a 
fixed value X. Hence, y for a; = X has a normal distribution with the 
standard deviation 


<rj\/l — r* 

and the mean 


Y = r—X. 

Interpreted geometrically, this equation represents the so-called 
‘Mine of regression'^ of y on z. 

In a similar way, we conclude that for y = F the distribution of x 
is normal with the standard deviation 


cTiVl — 

and the mean 


X = r^Y. 

at 

This equation represents the line of regression of x on y. 

LIMIT THEOREM FOR SUMS OP INDEPENDENT VECTORS 

7. So far normal distribution in two dimensions has been considered 
abstractly without indication of its natural origin. One-dimensional 
normal distribution may be considered as a limiting case of probability 
distributions of sums of independent variables. In the same manner 
two-dimensional normal distribution or normal correlation appears as a 
limit of probability distributions of sums of independent vectors. 

Two series of stochastic variables 

Xu Xt, , , , Xn 
Vly . . . Vn 

define n stochastic vectors Vi, V 2 , . . . v„ so that xu yi represent com¬ 
ponents of V,' on two fixed coordinate axes. If 


E{x^ == a{ E{yi) * 
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the vector Ai with the components a», hi is called the mean value of v<. 
Evidently the mean value of 

V = Vi + Vj + • ‘ + v„ 

is represented by the vector 

a = ai + aj + • • • + an 

and that of v — a is a vanishing vector. Without loss of generality 
we may assume at the outset that 

Eixi) = E(yi) =0; t = 1, 2, . . . n, 

in which case B(v) = 0. Vectors Vi, V 2 , . . . Vn are said to be inde¬ 
pendent if variables Xiy yi are independent of the rest of the variables 
Xjj yi where j ^ i. 

In what follows we shall deal exclusively with independent vectors. 

8. As before, let a;*, yk be components of the vector 

v*(A; = 1, 2, . . . n). 

Then 

X = X 1 +X 2 + ‘ + Xn 

V = 2/1 + 2/2 + • • • -f 2/n 

will be the components of the sum 

V = Vi + Vj + • • • + Vn. 

If 

E{xk) = E{yk) = 0 

Eixl) = 6*, E{yl) = cjfc, E{xkyk) = dk 

then 


E(X) =0, £^(7) = 0 

E(X^) = 6i + 62 + • • • + 6n = Bn 
E{Y^) = Cl “h C2 + • • • + = Cn 

E{XY) = d, + d, + . . . + = r.VKVC'n 

because 


B(a;i3//) = 0 if 39 ^ i, 

variables Xi and y,- being independent. 

Let us introduce instead of variables x*, yk(k = 1, 2, . . . n) new 
variables 
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and correspondingly 

X Y 

* ■ VF„’ v^, 

instead of X, Y. We shall have: 

£({*) = E(m) = 0 
Eiit) = Eivl) = g 

and 

E(8) = E((r) = 0 
E(s^) = Ei<r^) = 1 
i5?(5<r) = Vn. 

The quantity the correlation coefficient of s and a, is in absolute value 
gl. We define 

0(w, v) = E[e 

as the characteristic function of the vector «, <r. Evidently 0(u, 0) and 
^(0, v) are respectively the characteristic functions of s and <t. Since 

and the factors in the right-hand member represent independent varia¬ 
bles, we shall have 

0(w, v) = ‘ 

9. For what follows it is very important to investigate the behavior 
of 4>(u, v) when n increases indefinitely while u, v do not exceed an 
arbitrary but fixed number I in absolute value. 

Let 


and 


= fk, E\yk\^ = gk 


/l + A + • • • + /n 

Bi 

gi "h 92 • • • -h Qn 

C* 


If (an and rin tend to 0 as n —> oo, we shall have 

(1) v) - - 1 

provided 


|ii| ^ I, |t;| ^ I 
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and n is so large as to make 

+ vl) < 1. 

Since 

gi(ue*+«i|*) = 1 4- i(u(k + vrik) — + vriky + 




we shall have: 


- > - Wf - - K- + 


4“ ^E\uik + vrik\^t l^'l < 1. 


On the other hand, 
2dk 


1 - 


2Bn“ 2VB:jcr 2c;' 


Ck 


_j ^2 _ g 2a» 2C\ _|^ 


+ -^[E{u^k + Vr)kyY) W*\ < 1 


and so 


= e ^ ^^[E{uik + vr^kyY -f 


4- g-^|uf* 4- vyjk\\ 


Furthermore, 


because 


E{uik + vriky ^ + 2 a,iT,jl + vl) <1 


E(H) = I; < 0>l E(vV =^<vl 


Also 


[E(u(k + vrjk)^Y < [E(u(k + viy/k)*]* ^ E\uik + vifkY 
E\u(, + i;,*!’ g 

Taking into account these various inequalities, we may write 

__ _ 2 dk __ Ck 
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Finally, 



i;) = +•*>(! + <Ti){l + irO • • • (1 + <r«) 


and 


\^(Mf ») — < gki|+M+• • •+|r,| — 1 < g4I»(««+ir») _ J 


as was stated. 

10. Theorem. Let P denote the probability of simultaneous inequalities 


to ^ s < h; To ^ <T < Ti. 


Provided Vn remains less than a fixed number a < I in absolute value and 
the above introduced quantities 6j„, i?n tend toOasn-^ », P can be expressed 
as 


P = 


2irVl 


1_p* p 

1 */ro 


g 2(1-fA*) 


where A„ tends to 0 uniformly in to, U; ro, ri. 

Iff in additionf rn itself tends to the limit r(|r| < 1)P will tend uniformly 
to 



2(1 




2rtT+T*> 


dtdr. 


Proof, a. In trying to extend Liapounoff's proof to the present case 
we introduce an auxiliary quantity IT defined as 



Using the inequality 

— 7 =: I e~^'dt < for X > 0, 

one can easily derive the following inequalities; 
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if 

(3) ^0 ^ ^ < fi; To ^ O’ < Tif 

and 



if at least one of the inequalities (3) is not fulfilled. From the definition 
of n, P and from (2) and (4) it follows that 


|P - nl < C * *) + e ('** *) + e C‘* ') + e (V)’), 

But referring to (1) and setting 


^ 1 = ctniD 

wo have by virtue of the developments in Chap. XIV, Sec. 3, 

(5) IP - n| < 2a.(J) + A V2 + ^ 

6. Replacing ti, ti by variable quantities t, r and taking the second 
derivative of n with respect to t and t, we get 


dtdr 



On the other hand 




-e 

T 


4 irv- -J- 






whence 

S = J- 

Here we substitute 

0(tt, v) = + g(Uf v). 

For all real u, v 

»)| g 2. 

If \u\ ^ I, |t;| ^ {, where I is an arbitrarily fixed number, and n is large 
enough, we have 


l»(«. »)l ^ «»(0- 
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Hence, the double integral 







extended over the region outside of the square \u\ ^ Z, |&| ^ Hs less than 







^ rdr < 


h * i * 

e ^ 


in absolute value. The same double integral extended over the square 
|ti| ^ I, |t;| ^ I is less than 



in absolute value. Thus, referring to (6) 


dm 

dtdr 


- jLr“ r“ 

4 ir*J- 


~4^(ii*+»«)-s(u*-f2r»ut»+t.*) ^ V , , . 


and 


hH * 


li?l < ' 




Now 


and 


= 1 _ |X| < 1 


-- + - 5 (I^' ^ 

Hence 



g-|(w«+2r,uv+v*)g-»(lu+rt»)^l^j, ^ 


and 


ifi'i 




(M)« 

4 






4t(1 - «*)» 


By transformation to new variables 


{ = u + Tnv; 


q = »\/l - r* 
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the foregoing double integral becomes 


2Wl - rt 


SO that finally 


oral ^ _J__ 

dtdr 2,rvT^„ ^ • 

Integrating this expression with respect to t and t between limits fo, 
and To, Ti, we get: 


( 7 ) n 






(8) IpI < (h - <«)(t, - r„) + 


47r(l — 


Hence combining inequality (5) with (7) and (8), 

P = —f '"'-'dtdx + A, 

2Tr\/l — r?Jto Jto 


hi 


(*/)» 

(<1 — io)(Ti — To)la»(0 + + 


+ {h - to)(r, - r.)l +hV2 + (<- - - ;o)^^ 

‘ 4ir(l - 


Considering to, h; to, n as variable and denoting an arbitrarily large 
number by L, we shall assume at first that the rectangle D 

to ^ s ^ ti. To ^ ff ^ Ti 

is completely contained in the square Q: 

ls| ^ L, |a| ^ L. 

Then, taking h = we shall have 




(0 + U 2 + 4 L»\ + V 2 J 2 + 


ir(l - o *)5 
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Given an arbitrary positive number €, we take I so large as to have 


U 



+ V2J ^ + 


LH-' 

»(! - a*)* 


< 


1 

r- 


After that, since a«(Z) —> 0 as n —► « (for a fixed 1) we can find a number 
no(€) so that 

.. 8 ) < ^2 + 

for n > no(€). Finally, we shall have 

|Anl < € 

as soon as n > no(€); that is. An tends to 0 uniformly in any rectangle D 
contained in the square Q with an arbitrarily large side 2L. 

c. To prove that An tends to 0 uniformly no matter what are to^ h; 
To, Ti we observe that the integral 


1 

2irVr^ 





didr 


extended over the area outside of Q becomes infinitesimal as L —> oo. 
Accordingly, we take L so large as to make this integral < c/2 (no matter 
what n is) and in addition to have < c/4. The number L selected 
according to these requirements will be kept fixed. 

Let D' represent that part of D which is inside Q, the remaining part or 
parts (if there are any) being D". Let P' and P" denote the probabilities 
that the point «, <r shall be contained in D' or D", respectively. Also, 
let and J" be the integrals 


1 

2T\/r^ 





dtdr 


extended over D' and D", respectively. By what has been proved, given 
c > 0 a number no(c) can be found so that 


for n > no(c). Now 


|P' - J'l < c 


P = P' + P"; J = + J", 

whence 

IP - J1 < € + P" -h 

for n > no(c). Since by Tshebysheff's lemma (Chap. X, Sec. 1) the 
probability of either one of the inequalities 
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\8\ > L or kl > L 
is less than l/L, we shall have 


p// 




Also, 

j" < t, 

whence 

|P - J| < 2« 

for n > no(€); that is, the difference 


P - 


2irVl 


tJ:x 


- oTT— rrJ** - +'■•1 j 

e -'■-*) dtdr 


tends to 0 uniformly, no matter what U, ti; to, ti are. 

Finally, the last statement of the theorem appears as almost evident 
and does not require an elaborate proof. 

11. The theorem just proved concerns the asymptotic behavior of 
the probability P of simultaneous inequalities 


k ^ 8 < t\] To ^ <y < Tj 

which, due to the definition of s and <r, are equivalent to the inequalities 

Uy/Sn ^ Xl X% • • • + iCn < t\\/Wn 
To\/Cn ^ yi + 2/2 + • • * + Vn < r\y/Cn- 

From the geometrical standpoint the above domain of s, <r is a rec¬ 
tangle. But the theorem can be extended to the case of any given 
domain R for the point «, <r. It is hardly necessary to enter into details 
of the proof based on the definition of a double integral. It suffices to 
state the theorem itself: 

Fundamental Theorem. The probability for the point (s, a) to be 
locaied in a given domain R can be represented^ for large n, by the integral 


%ty/\ 


Ml 


e 2 ( 1 -r.")'*’ 


extended over R, with an error which tends uniformly to 0 a.s n becomes 
infinite, provided 


«ii —►O, 


Vn —^ 0, 
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while for all n 

|r„| < a < 1. 

In less precise terms we may say that under very general conditions 
the probability distribution of the components of a vector which is the 
sum of a great many independent vectors will be nearly normal. 

The first rigorous proof of the limit theorem for sums of independent 
vectors was published by S. Bernstein in 1926. Like the proof developed 
here it proceeds on the same lines as LiapounofLs proof for sums of 
independent variables. Moreover, Bernstein has shown that the limit 
theorem may hold even in case of dependent vectors when certain addi¬ 
tional conditions are fulfilled. 

12. A good illustration of the fundamental theorem is afTordtid by 
series of independent trials with three alternatives, F, G. For the 
sake of simplicity we shall assume that probabilities of F, G are 
p, g, r in all trials. Naturally 

p H- ^ -h r = 1. 

In the usual way, we associate with these trials triads of variables 
yif Zi (i = 1, 2, 3, ... ) 

so that 

«< = 1 or 0 according as E occurs or fails at the ith trial; 

Pi = 1 or 0 according as F occurs or fails at the itli trial; 

Zi = 1 or 0 according as G occurs or fails at the zth trial. 

Evidently 

E{xi) = Eixf) = p 

EiVi) = ^(p?) = q 

so that vectors v< with components 

(i = Xi — p, rii — Pi — q 

have their means == 0. The independence of trials involves the inde¬ 
pendence of vectors Vi, V 2 , . . . v„. Hence we can apply the proceeding 
considerations to the vector 

V = Vi + V 2 -f • • • + v„ 

with the components 

== £1 + {2 + * * • + fn 

F = T/i + »y2 4 - • * • + r?n. 

We have 

Bn = E(X^) = np(l - p); Cn = EiY^) = nqH - q). 
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Moreover, 

= E(x(yi) - pg = -pg 

and 

E{XY) = uVKVc: = -npg 

whence 


Vpg(l - p)(i - g) 

The quantities denoted by fkt Qk in Sec. 9 are in our case 

/* = = p(l - p)® + (1 - p)p* 

gjfc = E\rik\^ = g(l - (?)» + (1 - q)q\ 

Hence 

= P(1 ~ P)^ + (1 p)p^ _ g(l -• -f (1 ~ g)g* 

w*pi(i — p)i * “ q)^ * 

and the conditions 

Wn 0, yjn-^0 

are satisfied. The fundamental theorem, therefore, can be applied. 
If kf I, m are the respective frequencies of events Ef F, G in n trials, the 
quantities X and Y represent the discrepancies 

\ = k — npf p = I — nq. 

Introducing the third discrepancy 


we shall have 


V = m — nr 


X + M + *' = 0 


so that V is determined when X and p are given. The last two quantities, 
however, may have various values depending on chance. Concerning 
them the following statement follows from the fundamental theorem: 

Theorem. The probability that discrepancies X, p in n trials shall 
simultaneously satisfy the inequalities 

aoy/n < X < ai\/n; Po\/n < p < fiiy/n 
tends uniformlyj with indefinitely increasing n, to the limit 


2ir\/ pqr, 


r r 

c/ao Jfio 


2\p^q^ 




dadfi 
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wherey to have symmetrical notatioriy y is a variable defined by 

a + + 7 = 0. 

On account of symmetry, perfectly similar statements can be made in 
regard to any two pairs of discrepancies X, /i, v- 

Since the fundamental theorem and its proof can be extended without 
any difficulty to vectors of more than two dimensions, we shall have 
in the case of trials with more than three alternatives a result perfectly 
analogous to the last theorem. 

Theorem. Each of n independent trials admits of k alternatives Ei, 
Ei, , , . Ek the probabilities and the frequencies of which respectively are 
Ply p*, . . . pk and mi, mi, . . . m*. The probability that the discrep¬ 
ancies mi — npi(i = 1, 2, ... A; — 1) should satisfy simultaneously the 
inequalities 

ai\/n < mi — npi < ft\/n 

tends uniformlyy with indefinitely increasing n, to the limit 


k 

1 


53 -!-r... 

(2x) * VPiPt ■ • • p/“' 


dlidt% 


dtk^i 


where 

— (<i 4* fa + * ’ * + 

From this theorem, by resorting to the definition of a multiple integral, 
we may deduce an important corollary; Let Pn denote the probability of the 
inequality 


(mi ~ npiY (mi - np^p . 
npi npi 


(mt - npt)» ^ 


npk 


TheUy as n tends to infinity Pn tends to the limit 


Pk ^ 


; 2\prpr 




dt. 


dtk^i 


k-\ _ 

(2ir) * VpiPi • 

where the integration is extended over the {k — 1) dimensional ellipsoid 

,.|+a + ...+Sa,.. 

Pi Pi Pk 

It is easy to see that the determinant of the quadratic form <p in 
(A: — 1) variables is (pipi • • • p*)“'. Hence, by a proper linear trans¬ 
formation the above integral reduces to 

■ ■ ■ ■"’-’’‘to.*. • • • ‘to*-- 
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the domain of integration being yf -f *^1 + * * * -f ^ x*. But 
this multiple integral, as will be shown in Chap. XVI, Sec. 1, can be 
1 .educed to a simple integral 


*zJ 

2t 2 


Thus 


lim Pn 


1 fx ^lut 


The probability Qn = 1 — Pn of the opposite inequality 

/^s (mi - npi)« (m, - np,)« .... (m^ - npQ* , 
' ' npi npt npt * 


tends to the limit 


*-3 

2 2 ri 


. 


and for large n we have an approximate formula 


Qn = 


k~Z 

2 * n 




e 2 


but the degree of approximation remains unknown. In practice, to 
test whether the observed deviations of frequencies from their expected 
values are significant, the value of the sum (^4), say x^ is found; then 
by the above approximate formula the probability that the sum (A) will 
be greater than x* is computed. If this probability is very small, then 
the obtained system of deviations is significantly different from what 
could be expected as a result of chance alone. The lack of information 
as to the error incurred by using an approximate expression of Qn renders 
the application of this “x*"test^’ devised by Pearson somewhat dubious. 


Hypothetical Explanation op Empirically Verified Cases of 
Normal Correlation 

13. Normal distribution in two dimensions plays an important part 
in target practice. It is generally assumed on the basis of varied evidence 
collected in actual target practice that points of a target hit by projectiles 
are scattered in a manner suggesting normal distribution. By referring 
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points hit by projectiles to a fixed coordinate system on the target, it is 
possible from their coordinates to find approximately (provided the 
number of shots is large) the elements of ellipses of equal probability. 
Dividing the surface of the target into regions of equal probabilities as 
described in Sec. 4, and counting the actual number of hits in each 
region, the resulting numbers in many reported instances are nearly 
equal. That and the agreement with other criteria are generally con¬ 
sidered as evidence in favor of assuming the probability in target 
practice to be normally distributed. 

Two-dimensional normal distribution or normal correlation has been 
found to exist between measurable attributes, such as the length of the 
body and weight of living organisms. Attributes like statures of parents 
and their descendants, according to Galton, again show evidence of 
normal correlation. 

Facing such a variety of facts pointing to the existence of normal 
correlation, one is tempted to account for it by some more or less plausible 
hypothesis. It is generally assumed that deviations of two magnitudes 
from their mean values are caused by the combined action of a great 
many independent causes, each affecting both magnitudes in a very small 
degree. Clearly, the resulting deviations under such circumstances may 
be regarded as components of the sum of a great many independent 
vectors. Then, to explain the existence of normal correlation, reference 
is made to the fundamental theorem in Sec. 11. 


Problems for Solution 

1. Let p denote the probability that two normally distributed variables (with 
means = 0) will have values of opposite signs. Show that between p and the corre¬ 
lation coefficient r the following relation holds: 


r = cos pr. 


2. Variables x, y (with the means = 0) are normally distributed. Show that the 
probability for the point x, p to be located in an ellipse 


a, <ri<ra a. 


I 


is greater than the probability corresponding to any other domain of the same area. 

8. Three dice colored in white, red, and blue are tossed simultaneously n times. 
Let X and Y represent the total number of points on pairs: white, red and white, blue 
Show that the probability of simultaneous inequalities 


7n + < X < 7n -f 7n -f toV^< F < 7n + ti\/^ 

tends to the limit 



n 
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4. Three dice, white, red, and blue, are tossed simultaneously n times. If k and I 
are frequcncii^s of 10 points on pairs: white, red; red, blue; show that the probability 
of simultaneous ineciualities 



tends to the limit 



<l 


n 



n 

27r\/l^ 





as n —> 00 . 

5. Two players, A and H, take part in a Kaine arranged as follows: Each time one 
ball is taken from an urn containing 8 white, 6 black, and 1 red ball; if this ball is 

white, A and B both gain SI; 
black, A loses $2, B loses S4; 
red, A gains $4, B gains $16. 

Let Sn and <r« be the sums gained by A and B after n games. Show that the probability 
of simultaneous inequalities 


< Sn < ti\^n; toa/ 48n < < ri48n 

for very large n will be approximately equal to 



„C(t* + r*- V%r)^dr. 


Note that the probability of the inequality 5 „an < 0 is about 0.13—not very small— 
so that it is not very unlikely tlint the luck will be with one player and against another. 

6 . Concentric circles C’l, Ca, C 3 , ... in unlimited numbers are described about 
the origin O. Points Pi, P^ Pi, • . . arc taken at random on these circles. i.,ct It 
be the end point of the vector repre.senting the sum of vectors 0/^, OP^, OPi, . . . . 
If Ti, rj, ra, . . . arc radii of Ci, C 2 , C’s, . . . ami the condition 


r; + r; + • — H- r; 

(rf+ 4+ • +>■:)* 


a —+ 00 


is fulfilled, show that the probability that R will lie within the circle described with the 
radius p about the origin will be very nearly equal to 


_c* _ _ 

1 _ e riM ri»+ • • • ^rn* 


for large w. 
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CHAPTER XVI 


DISTRIBUTION OF CERTAIN FUNCTIONS OF NORMALLY 
DISTRIBUTED VARIABLES 


1. In modern statistics much emphasis is laid upon distributions of 
certain functions involving normally distributed variables. Such dis¬ 
tributions are considered as a basis for various “tests of significance’* 
for small samples, that is, when the number of observed data is small. 
Some of the most important cases of this kind will be considered in this 
chapter. 

Problem 1. Independent variables Xu a; 2 , . . . Xn are normally 
distributed about their common mean = 0 with the same standard 
deviation a. Find the distribution function of the sum of their squares 

fi = a:? -f a:i -f * • * + a:*. 

Solution. The inequality 

< t 

being equivalent to 

— < Xi < V?, 

the distribution function of xj is 

1 rVi 1 rt -I 

Fi(j) = —I e ^'dx = — y=z I e ^du for i ^ 0 
<r V 2 t J _ y/i 

Fi{t) =0 for t<0. 

Hence, the characteristic function of any one of the variables xj, a;|, 
... x* is 


ipk(t) = —r e ^du 

and that of their sum 



1 

2 


ip{t) = 



Consequently, the distribution function of s is expressed by 


F(t) - C -h 
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and it remains to transform this integral. To this end, imagine a variable 
distributed over the interval (0, + qo ) with the density 


n 



Its characteristic function is 


and since the distribution function is given a priori, we must have for 
t ^ 0 



du = const. + 



Hence 


F{t) = const. + 



du. 


The constant must be = 0 since F{t) as well as the integral in the right 
member vanishes for i = 0. The final expression is therefore: 


F{t) =-^^ r e ^du for ^ ^ 0 

Fit) =0 for t ^0. 

The probability of the inequality 


Xl Xl + ' ' • Xl < ty 

on the other hand, can be expressed directly as a multiple integral 


- (7^r-J/ ■ ■ r 

extended over the volume of the n-dimensional sphere <S 
x\xl-\- • • ' + xl < t. 

By equating both expressions of F{t), we obtain an important transforma¬ 
tion, 
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(1) 


// J« 






dxidxt • • • dxn = 


'<$r 


t _JL 


If F{x\ 4- xl + 


4- xl) is an arbitrary function of 


the integral 


u ^ xl-h xl-h — ‘ -i- xl 


* ‘ ‘ - i-Xn* 

e F{A+ • • ‘ -hxl)dxidx 2 • • • dxn 

extended over the whole n-dimensional space represents the mathematical 
expectation of F{u). On the other hand, the distribution function of 
u being known the same multiple integral will be equal to 



_L__f 

(<rV 2 )"rQ*^“ 


e 


'^'F{u)u ^ du. 


Taking in particular <7 = 1, F{u) — we get the formula 


( 2 ) 




-• (xi»+ • • • -\-xn*)-\-ay/xi'-\- 


{-Xh' 


dx\dx2 


dXn = 


ir2 r- -“+<.«« 

= I e “ u ^ du, 


\2 


which will be used later. 

2. Problem 2. Variables Xi, X 2 , . . . Xn are defined as in Prob. 1. 
Denoting their arithmetic mean by 


xi 4- X2 4- • • * 4- aJn 

-) 

n 


find the distribution function of the sum 

S = (Xi - s)2 4- (a ;2 - s)2 4- • • • 4- (Xn “ s)*. 
Solution. The probability of the inequality 

S < t 

is expressed by the multiple integral 
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■ ■ ■ J* 


dxidxt 


extended over the volume of the n-dimensional ellipsoid 

(a:i - ay -h (a:, -«)*+•** + {x. - sY < L 


Let 

whence 

and 


rci — « = ui, xj — 8 = Us, • • • — « = ii«, 

Ui “h Us + * * * “h Un ~ 0 

+ x} -f • * * 4- a;* = u*i + u| -f ‘ ‘ • + wj 4- n8*. 


Taking ui, Us, . . . Un-i, and s for new variables, we must first find the 
Jacobian J of X\, Xs, . . . Xn with respect to Ui, us, . . . Un-i, s. It is 


1 

1 

0 

0 • • 

• 0 


1 

1 

0 

0 • 

• 0 

1 

0 

1 

0 • • 

• 0 


1 

0 

1 

0 • 

• 0 

1 

0 

0 

1 • • 

• 0 

= 




• • 


1 

0 

0 

0 • • 

• 1 


1 

0 

0 

0 • 

• 1 

1 

-1 

-1 

-1 • • 

• -1 


n 

0 

0 

0 • 

• • 0 




In the new variables the expression for Fit) will be 

/• /» /• n»» • • • -f-u** 


■ ■ ■ J* 


dsduidui 


and the domain of integration in the space of the new variables is defined 
by 

— 00 < 8 < 00 

u* 4- 4" • * • 4- uL-i 4- (ui + ws 4“ * * • + Un-i)* < i. 

After performing the integration with respect to 8, we get 

•%/m p r r 

‘ « ‘- 


_ Vn c c r 

(aV 2 ^)-‘J J ‘ 4 * 


The quadratic form 

^ = u} 4“ w| 4“ • • * 4" 4~ (wi 4“ w* 4“ • * * 4“ 

can be represented as a sum of the squares of (n — 1) linear forms in 
variables ui, Ut, . . . Un^ii 

T ^v\ + vl + • • • + wt-i. 


The Jacobian 
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9(t»i, Vt, . ■ ■ o—l) 

a(Mi, Mj, . . . «.-i) 

is the square root of the determinant of the form <p, which is the same 
as the determinant of linear forms 

^ — 2ut + «»+•••+ u»_i 

5 ^ = Ml + 2uj + • • • + M^i 


2 dM„_, 

Now, in general 

p times 

xiT^ 1 

1X1 • • • 1 


= Ml + «, + 


= (X-l)p-‘(X + p-l) 


llll • • • X| 

so that the determinant of ^ is =n, whence 

d{vi, Vt, • ■ • y«-i) 


d(Mi, tt,, • • • M,_i) 


= 


d(Ui, M„-i) ^ 1 

S(vi, Vt, ■ • • »,_i) 

Therefore, taking vi, Vi, . . . v,_i for new variables, F{t) can be expressed 
as follows 


4-J/ ■ 


dvidvi • • • dvn-i 


where the integral is extended over the volume of the sphere 
v\ + vl^ - • + vl_, <t. 

This multiple integral is exactly of the type considered in the preceding 
problem, and it can be reduced to a simple integral as follows 




dvidvt • • • dun^i = 


•I_ n -3 

e 2 du. 


X 2 r* “ ’ 
“T-rr e a-'u 

■'(=-rr 
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After substitution, the final expression of F(t) is 


F(t) 


1 r\ 


F(t) =0 for f g 0, 


_ n —3 

e ^ du for t > 0 


3. Problem 3. Variables Xi, x*, . . . Xn are defined as in Prob. 1. 

As in Prob. 2, we set 

' \ 

„ »i + a;jv4: * • • >- ®n,, 

Ui = x<- s; 2, . , , n 

and introduce the quantity 


^ tti + ui + ■ ■ ■ + u* 
What is the distribution function of the ratio 


8 

€ 


or, which is the same, the probability F(t) of the inequality 

5 < U? 


Solution. First, assuming t to be positive, let us find the probability 
0(0 of the inequality 


8 ^ U 


or 


U1 + U2 + ' • ' + 


ns^ 

"F* 


This probability can be presented in the form 

„ /» go ng» 

♦«) = - ” e 

(crV2»r)Vo 

where the multiple integral 


^(s) 








dUidU2 • • • dUn-l 


in which 


*= - (ui + Wa + • • • + Wn-l) 
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is extended over the domain 

+ ^2 H“ ’ * * + -f- (ui + W2 -f * * * 4- Un-lY ^ 

Proceeding in exactly the same manner as in Prob. 2, we can transform 
^(s) into 


extended over the sphere 


dvidvz * * • dvn^i 


-f t;| + • • • + ^ 


in the space of the variables Viy V 2 f . . . Vn-u For this multiple integral 
we can substitute a simple integral 


n --1 na* 

r 2 r t» _J1 nzJ 

- r- I e ^*u ^ du 


r 

i"-^r 


n—1 n — 1 a 


^ 2 ir 2 n g r* 


i -ItV 


and thus reduce ^(s) to the form 


n-l n-2 a 

2«- 2 f' 2i’ 


jlJ C 

- l\Jo 


i^) 




After substitution we can express <p(t) as a repeated integral 

9n2 r* _!L?! r* 

=- y -^ I e '^'ds I e 

The derivative of 0(0 is 
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whence 




(1 + **)» 


*(+«») =0; 




(1 + **)» 


BO that C 1 and 


v^K^y- 




Such is the probability of the inequality 

8 ^U, 

The probability F{f) of the inequality 


will be 1 — 0(0 or 


i) r ^ 

- _i w““rt 4- *»' 




(1 + Z*)" 


but this is established only for positive t. However, this result holds 
for negative t as well. For t being negative » — t the inequality 


8 < —T€ 


is entirely equivalent to 


—a > T« 


and its probability is evidently 


(_r) = ^(r) = 1- 7 ^^ r (1 + 

vlr{!4iy- 



due. 41 
But 
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v»(V)‘ 

which permits of writing the preceding expression for F(~t) as follows* 

Ke) 


F(-r) 


• f* 


(1 + «») ^dz 


n 






Thus, no matter whether t is positive or negative, the distribution func¬ 
tion of the ratio 


or the probability of the inequality 


is given by 


B < U 


m 



+ 2*) ^dz. 


The distribution of the quotient s/c was discovered by a British 
statistician who wrote under the pseudonym ** Student,^' and it is com¬ 
monly referred to as “Student^s distribution/^ The first rigorous proof 
was published by R. A. Fisher. 

4. Problem 4. Variables a;, y are in normal correlation. A sample of 
n corresponding pairs, Xi, yi; xj, yt; . . . x», y» is taken and the correla¬ 
tion coefficient of the sample’’ is found by the formula 


^ 2:(x< - 8){yi - s') 

" VMxi - «)* • 2(y« - «')’ 
where, for the sake of abbreviation. 


aJi “h ara + * * * + iCi 


+ yt + * • • + *• 
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Find the distribution function of p, that is, the probability P of the 
inequality p < Rior & given R{ — 1<R<1). 

Solution. Since the expression of p is homogeneous of degree 0 in 
zijXiy . . • Xn;yI, y 2 y • • • 2 /« we can assume o-i = 0-2 = L Also without 
loss of generality the expectations of x and y may be supposed =0. 
Denoting by r the correlation coefficient of x and y, the density of proba¬ 
bility in the two-dimensional distribution will be: 


g 2(1-r*) 


(x* + i/*-2rxv) 


2ir(l - r*)*" 

Hence the required probability will be expressed by the multiple integral 


p = --- f r . . . (*«*(» '‘>dxi • ■ ■ dxndyi ■ ■ ■ dyn 

(2ir)”(l - '' 

extended over the 2n-dimensional domain 

(3) 2(*i - 8){yi - s') < RV2(Xi - s)!*- 2(y, - a')* 
and 

(4) ip = 2xf -(- Sy? - 2rXxiyi. 

Replacing Xi, yi(i = 1, 2, . . . n), respectively, by \/l ~ r*a;<, \/l — r^Vi, 
we can write P thus: 


P = 




dy„ 


while (3) and (4) still hold but with the new notation for the variables. 
Let us set now 


then 


Xi — S = Uif — 5 ' = Viy 


y>l + U 2 Un = Of til + t;2 + * * • + Wn = 0. 

Introducing 5 , Wi, W 2 , . . . w»-i ;vi,V 2 , . . . Vn-i as new variables, we 
find as in Sec. 2 


P = 


^2(1 _ ^2)2 
(2ir)" 




e 2 dsds'dui 


dun^idvi 


dvn-i 


where 

^ = ns* 4- ns'* — 2nr55' -f SmJ + 2v} — 2rXu^Vi 
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and the domain of integration is dehned by the inequalities 

— oo<S<oo; — 00 <s '<00 

XutVi < R\^'Luf • Sv?. 

Now by the same linear transformation the quadratic forms Xu}, 
Xvf (each containing n — 1 independent variables) can be transformed 
into 






at the same time 


n n —1 


Proceeding as in Sec. 2 and noting that 






dsds^ = 


2t 


n\/i ~ r* 


we find that 


P = 
where 


■ • • dw,.,dzy 


dZn 


X = 2t/;? -f 20? — 2rXwiZi 

and the domain of integration in the space of 2n — 2 dimensions is defined 
by 

XWiZi < R\/Xw} • Xzl 

We shall integrate now in regard to variables Zi, Z 2 , . . . Zn-i for a fixed 
system of values Wi, Wi, . . . Wn-i. To this end we use an orthogonal 
transformation 

Zl = Ci.ifi + Ci.n-lfn-l 

= Cj.ifi + C2.2fa + • * * 4- C2.n-lfn-l 


Zn-l = Cn-\,\i\ + ‘ + Cn-l.»-lfn-l 

in which the elements of the first column are 


Ci.i = 


Wi 


Wi 


y/w\ + • • • + 
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Defining f i, {t, . . . {n-i by 

Wi Ci.ifl + Cl,l£* + • • • + 

v>t = C*,i£i + Cs.l£l + • • • + Cs,n-l£i»~l 


we shall have {i « w, £i * * • * = £n-i = 0. By the properties of 
orthogonal transformations 

so that for a fixed system of values Wt, . , . Wn^i the domain of 
integration in the space of variables {i, ft, . . . fit-i will be 

(6) fi < BVlfj. 

Thus we must first evaluate the integral 

J « /J . . . 

If fI < 0 no restriction is imposed upon fi, . . . if fi > 0, then 
f I + • • • + fj-i > (^, - i)f?- 

Consequently the result of integration in regard to fi, . • . fn-i can be 
presented thus: 

J = ce^ - • • • d{^^ 

where the inner integral is extended over the domain 

rj + • • • + fi-i < (^, - i)ff 

and c is a constant. Making use of formula (1), Sec. 1, the expression of 
J reduces to 


ee 


r^n^yo Jo 


<h-^y 

e *v**“^. 


This has to be multiplied by 


1 S_1 

- r*) a e a dwi • • • dw^^i 

and integrated over the whole space of the variables wu u’tt • • • 
The resulting expresdon for P will be 
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P. f r... L-i 

,2,,.-.r(5^y J ■> 




dWn^ 


where 


Now we differentiate in regard to /?, reverse the order of integrations, 
and make use of formula (2), Sec. 1; the resulting value of dPfdR will 
then be expressed as a double integral 


^ ^ IT ^(1 - r^) (1 - R^) 2 

or 

since 

In the double integral we make transformation to new variables f, tj 
defined by 


lo Jo 


5«»+u»)-Hftrtu 


XiuY~HtdUf 


f = 


i 

—; 
U 


r\ = tu. 


The Jacobian of f, u in regard to f, ri, being 3^£~S we have 

f-f%4 

Jo Jo 

= ir(« -1) JT 


Jo Jo 


’ { 






= r(n - 1) 


r- dt 
Jj (cfd-R 


RrY 


and so, finally, 


dR 



- r*) 2 (1 - Rt) 2 


T 


di 

(cht — RrY 


In case r = 0, that is, when the variables x, y are uncorrelated, we hav 
a very simple expression of P: 
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P == 




In cajse r 9 ^ 0 the integral 


p (U 
Jo {cht — Rry~^ 


can still be found in finite form. We have, in fact, 


X' 


dl 


==^.[1 +arc sin (Kr)j, 


whence 


chi — Rr 

X - 0ir'. - “’'■'-‘[l+"" ‘®'> ]}• 


and so 


P = aJ_*(1 - - phT*[l + arc sin (rp)]Jdp, 


where 


w-l 

_ “ r^) 2 

^ ' w(n - 3) ! 

When n is an even number, this integral appears in a very simple finite 
form, but in case of an odd n certain integrals of a rather complicated 
type appear. Besides, the behavior of P for somewhat large n cannot 
be easily grasped by using this integral expression for P. 

6, Fisher, who was first to discover the rigorous distribution of the 
correlation coefficient, called attention to the fact that, setting 

thz = 

\/2(Xi - - s')* 

the distribution of z will be nearly normal even for comparatively small 
values of n. Let us set thR = w, th^ = r; then P can be expressed thus: 

p = f” f *_ chzdtdz __ 

TT J-«Jo {chlchzch^ — sh^shzY’'^ 

Instead of t it is convenient to introduce a new variable r so that 

chtchzcht — 8 hl;$hz = T-^ch(z — T)* 
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Then 

p _ n — 2 T" /chz\^ dz r^r-h(l — r^-^dr 

»V2 J _. w - r)l"-‘Jo vT^ 

where 

ch(z + f) ^ chju + {•) 

^ 2chzch^ “ 2chu>ch^ 

for all values of z under consideration. Now 


J '*ij __ -j-jn 2^^ ^ \/7rr(n — 1) 

0 \/i — pr r(^ “ i) 

— tY -Ht ■\/ iV{n — 1 )/, I P \ 

Jo Vr^ P^ r(n - i) X 2n - 1 j 

Jo vl ~ pi- Jo 

for 0 < p < 1 as can be easily verified. Consequently 


and 


since 


p — ““ 2)r (7i — JO f" dg _ 

" \7^1xn - ir J- -V^y icA(« - 

r 1 -L ® 1- 

L 2Sho,rM 2n - ij’ 


0 < < 1 . 


As to the integral in this formula, its approximate expression, omitting 
terms of the higher order, is: 


/: 


--j-i(.-r)' th^ 

2n ---3'^ 




Thus for somewhat large n the required value of P can be found with 
the help of a simple approximate formula. 

The various distributions dealt with in this chapter are undoubtedly 
of great value when applied to variables which have normal or nearly 
normal distribution. Whether they are always used legitimately can 
be doubted. At least the **onus probandi^^ that the “populations^' with 
which they deal are even approximately normal rests with the statisticians. 


Problems for Solution 


1. Show that 


€ ^ du 




=4.f e-h« 

\/ 2 irJ- • 


Htnt: Liapounoff’s theorem and Prob. 1, page 332. 



346 INTRODUCTION TO MATHEMATICAL PROBABILITY (Chap. XVI 


2. With the same assumptions and notations as in Prob. 3, page 336, show that the 
distribution function of the quotient 


* 1, 2, ... n 


F{t) 




£ y/n - 1 


F(i) = 1 


^ > V n 


F{i) =0 if i < - y/n- 1. 


It is worthy of notice that forn * 4 the distribution is uniform.* 

8. In two series of observations, samples aci, xj, . . . x* and j/i, yj, . . . y«' from 
the same normally distributed population (or of the same normally distributed vari¬ 
able) are obtained. Denoting for brevity 


*1 + 4- 


-f Xn 


2/1 + ya -f 


4-y«* 


' \(n 


find the distribution function of the quotient ■ 


(“Student")- Ans. 


m 


/ n+n'-A 

\ 2 / ri dr 

/-/rt + n'-2\\ . "tg'-Hl* 

- “(1 + r>) * 
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1. Euler’s Summation Formula. Let f{x) be a function with a 
continuous derivative f(x) in an interval (a, b) where a and h > a are 
arbitrary real numbers. The notation 

n:ib 

X/(«) 

n >a 

will be used to designate the sum extended over all integers n which are 
> o and ^ 6. It is an important problem to devise means for the approxi¬ 
mate evaluation of the above sum when it contains a considerable number 
of terms. 

Let [x], as usual, denote the largest integer contained in a real number 
X, so that 

x = [xl-f 0 

where $y so-called “fractional part“ of x, satisfies the inequalities 


0 ^ ^ < 1 . 


Considered as functions of a continuous variable x, both [x] and 0 have 
discontinuities for integral values of x. The function 

p(P^) = i- ^ = [x]-x-i-i 

is likewise discontinuous for integral values of x. Besides, it is a periodic 
function of x with the period 1; that is, we have 

p(x + 1) = pW 

for any real x. With this notation adopted we have the following 
important formula: 

n 

(1) Xm = £mdx + p(6)/(6) - p(a)/(a) - £p(x)/'(x)a. 

n>a 

which is known as “Euler^s summation formula. “ 

Proof. Let k be the least integer >a and I the greatest integer ^6. 
The sum in the left member of (1) is, by definition, 

/(fc)+/(fc + i) + • • • +m 

^nd we must show that this is equal to the right member. To this end 
We write first 

sa 
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y-i-i 

f%(x)f'(x)dx = jyx)nx)dx+fVx)f'(x)dx + ^ f‘-^Ux)nz)dx. 
Next, since j is an integer, 

£''\ix)f'{.x)dx = - X + 0/'(x)dx = -M+M+M + 

/•J+1 

+ J. f{x)dx 
and 

2 _ 2 /(n) + r/(x)dx. 

n-* + l 

On the other hand, 

£pix)nx)dx = - 1 - X + ^f'ix)dx = - p(a)/(a) + 

J‘p(x)/'(x)(ix = J‘(l - X + |y(x)(ix = + p(6)/(6) + J^/(x)(ix, 

SO that finally 

j‘V(x)/'(x)dx» -m -f{k + i) - ■ ■ ■ -m + 

+ p(W(b) - p(o)/(o) 

whence 

n^b 

n>tt 

which completes the proof of Euler^s formula. 

Corollary 1. The integral 

J[*p(*)d 2 = <r(x) 

represents a continuous and periodic function of x with the period 1. For 
<r(x + 1) - <r(x) = J^*■'■‘p(^)d* ’=[J^p(t)dz = jr*(J - «)d* =■ 

If 0 ^ X ^ 1, 


0 . 
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and in general 


<r(x) 



a?(l - x) 
2 


<x) 


HI - e) 

2 


where is a fractional part of x. Hence, for every real x 

0 ^ ff(x) ^ I 

Supposing that/"(a:) exists and is continuous in (a, b) and integrating by 
parts, we get 

f\(x)f'{x)dx = <r(6)/'(b) - <r(a)/'(a) - £,r(x)/"(z)<ix, 
which leads to another form of Euler^s formula: 

n 

- p(a)/(a) - <r(b)/'(b) + 

n >a 

+ <r{a)r(a) + J^,r(i)/"(*)(ix. 


Corollary 2. If /(x) is defined for all a: ^ a and possesses a continuous 
derivative throughout the interval (o, -f «); if, besides, the integral 

f/p{x)f'(x)dx 

exists, then for a variable limit b we have 

n^h 

(2) Xf{n) = C+ff(b)db + p(b)m + £%(x)/'(x)dx 

n >a 

where C is a constant with respect to b. 

It suffices to substitute for 

J^p(x)f(x)dx 

the difference 

f%(x)f'(.x)dx - J“p{x)f(x)dx 

and separate the terms depending upon b from those involving a. 

2. Stirling’s Formula. Factorials increase with extreme rapidity 
and their exact computation soon becomes practically impossible. The 
question then naturally arises of finding a convenient approximate 
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expression for large factorials, which question is answered by a celebrated 
formula usually known as ‘‘Stirling's formula,” although, in the main, 
it was established by de Moivre in connection with problems on proba¬ 
bility. De Moivre did not establish the relation to the number 

T = 3.14159 . . . 


of the constant involved in his formula; it was done by Stirling. 

In formula (2) it suffices to take a = 3^, fix) = log x, and replace h 
by an arbitrary integer n to arrive at the remarkable expression 

log (1 • 2 • 3 • • • n) = C + log n - n + J* 


where C is a constant. For the sake of brevity we shall set 


Now 



'* p{x)dx 

X 


and 


rp(x)dx _ p+‘p(x)dx ^ 

Jn ^ %/n ^ t/n + 1 


p{x)dx 


+ 



p(x)dx 

X 


r^ p(u)du _ C^ p{u)du C^ p{u)du _ 

Jo u + k ’“Jow + fc’^Jjw-ffc ~ 

~ u)du ~ u)du ^ 1 r* (1 - 2uydu 

Jo u + k "^Ji u + k 2jo {k + u)(k +1 —u) 


Hence 

«(n) = J//(I - 2u)Wn{u)du 

where 


Pniu) - 2(A; + «)(*: + !-«)■ 

ifc —n 

Since 

(jk + u)(k + 1 — w) = k{k + 1) + w — M*, 

it follows that for 0 < u < 3^ 

{k + u)(k + I — u) > k(k + 1) 

ik + u)(k + 1 ~ u)< (k -b < (fc + i)(k + f). 

Thus for 0 < tt < 3^ 
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^-(“) < 2ifc7]n 


k{k + 1) n 


Fn(.u) > 2 (fc + iXfc +1) - n + f 


Making use of these limits, we find that 


6j(n) < 
w(n) > 






2u)*du 


12(n -h i)' 


and consequently can set 


Accordingly 


^ 12(n 4- e) 

0 < d < i. 


log (1.2.3 ... n) =C + (n + 01ogn-n + jg,^^^j^. 

The constant C depends in a remarkable way on the number x. 
To show this we start from the well-known expression for x due to Wallis: 


/2 2 4 4 2n 2n \ 

\1 ■ 3 ■ 3 5 ■ ■ ■ 2n - 1 ■ 2n + 1/’ "' 

which follows from the infinite product 


by taking x = x/2. Since 


2 2 4 4 
1*3*35 


2n 2n 
2n — 1 2n + 


2.4 • 6 • • • 2n 1* 1 

1 [l-3-5 • • • (2n- 1)J 2n + l 


we get from Wallis' formula 


^ r 2 • 4 . 6 • — 2n 11 

^ ‘'"“[ 1 . 3.5 ... (2n-Dv^J’ 


On the other hand, 


2.4.6 --2n = 2^1-2.3 

1.3.5... ( 2 n-l)^ ^;^^ 3 ^ 
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so that 

or, taking logarithms 

log y/i = lim [2n log 2 + 2 log (1 • 2 • 3 • • • n) — 

— log (1 • 2 • 3 • • • 2n) — J log n ] 

But, neglecting infinitesimals. 


log (1 • 2 • 3 • • • n) = C -f (n + i) log n — n 
log (1 • 2 • 3 • • • 2n) = C + (2n -f- i) log 2n — 2n 


whence 


lim [2n log 2 + 2 log (1 • 2 • 3 • • • n) — 

— log (1 • 2 • 3 • • • 2n) — } log n] = C — i log 2. 


logVir * C — i log 2, C = log \/2 t 


and finally 


(3) log (1 • 2 • 3 • • • n) = log V'2)r + log n - n + 


12(71 ^ 

This is equivalent to two inequalities 

guhi < 1 • • -jn ^ 

y/2m 

which show that for indefinitely increasing n 

lim-==- =* 1. 

V 2irn n"€"" 

This result is commonly known as Stirling’s formula. 
For a finite n we have 


o<»<~ 


1 • 2 • 3 • * • n = y/%rnn^e* • 


The expression 


12(n + i) 12n* 


V2imn*e“* 
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is thus an approximate value of the factorial 1 • 2 • 3 • • • n for large n 
in the sense that the ratio of both is near to 1; that is, the relative error is 
small. On the contrary, the absolute error will be arbitrarily large for 
large n, but this is irrelevant when Stirling’s approximation is applied 
to quotients of factorials. 

In this connection it is useful to derive two further inequalities. 

Let m < n; we have, then, 


1 


F^{u) Fn(u) 2 (ib -f u){k -h 1 - tt)' 


and further, supposing 0 < u < 

Ir-n-l 


2i5rT«- = -s 

k^m > 

fc — n — 1 

F«(u) - F,(u) > 2 (T+TOTT) “ ~ 


k»m 


Hence, 

u(m) - «(n) < «(m) - <a(n) > - j 2 (n + 

and, if Zis a third arbitrary positive integer. 


c(m) + 0,(0 - «(n) < 

«(m) + 0,(0 - «(n) > i2(m + i) 12(1 + i) ~ 12(n + })’ 

3. Some Definite Integrals. The value of the important definite 
integral 

/o’*-* 

can be found in various ways. One of the simplest is the following: Let 


f: 




in general where n is an arbitrary integer ^ 0. 
can easily establish the recurrence relation 


n — 1 


f fl—1 f 


Integrating by parts one 


2 
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whence 


^ 1.3 • 5 • • (2m - 1) , 

Jim = --•/© 


Jlm+1 = 


1 ^ ^ 


On the other hand, 

+ 2xy, + \*J^i = f“ e-^-Kt + \)*dt, 

which shows that 


J n+l + 2XJn + XVn-1 > 0 

for all real X. Hence, the roots of the pol 3 momial in the left member are 
imaginary, and this implies 

Jn ^ 

Taking n = 2m and n = 2m + 1 and using the preceding expression 
for and Jim+h we find 

2-4-6--‘2m 1 2-4-6 *-2m 1 

1 . 3.5 .. . (2m- 1-3.6 . . . (2m- 1) ViSJ- 


But 


2 4.6 


2m 


ji?i 1 • 3 • 6 • . • (2« - 1) 


-Vi; 


hence 


/o = “ ^Vi. 

Here substituting t — y/au, where a is a positive parameter, we get 

As a generalization of the last integral we may consider the following one: 
V « J^*^”*"* hudu. 

The simplest way to find the value of this integral is to take the derivative 


db 



e”*** sin hu * tidu 


and transform the right member by partial integration. The result is 
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or 

whence 


-Itr 
db 2o 


diVe^o) = 0 , 


7 = Ce 


To determine the constant C, take 6=0; then 


so that finally 


C - (V)„ . ^ 

e— 

The equivalent form of this integral is as follows: 
J e-®“‘ cos bvdu = J ^-au^+u 


***+^“du 
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METHOD OF MOMENTS AND ITS APPLICATIONS 

1. Introductory Remarks. To prove the fundamental limit theorem 
Tshebysheff devised an ingenious method, known as the method of 
moments/’ which later was completed and simplified by one of the most 
prominent among Tshebysheff’s disciples, the late Markoff. The 
simplicity and elegance inherent in this method of moments make it 
advisable to present in this Appendix a brief exposition of it. 

The distribution of a mass spread over a given interval (a, h) may be 
characterized by a never decreasing function ^(x), defined in (o, h) 
and varying from ^(a) = 0 to ^(b) =» tWo, where mo is the total mass con¬ 
tained in (a, b). Since ^(x) is never decreasing, for any particular point 
xo, both the limits 

lim <p(xq — €) = ^(xo — 0) 
lim ^(xo + €) = ^(xo + 0) 

exist when a positive number c tends to 0. Evidently 

ifiixo - 0 ) ^ tp(xo) ^ ip{xo + 0 ). 

If 

^(xo — 0) = ip{xo + 0) = ip{xo)f 
then Xo is a “point of continuity” of ^(x). In case 
^0 + 0) > ^(xo - 0), 

Xo is a point of discontinuity of ^(x), and the positive difference 
ip{xo + 0) - ^(xo - 0) 

may be considered as a mass concentrated at the point Xo. In all cases 
^(xo -- 0) is the total mass on the segment (a, xo) excluding the end point 
Xb, whereas ^(xo + 0) is the mass spread over the same segment including 
the point Xo. 

The points of discontinuity, if there are any, form an enumerable set, 
whence it follows that in any part of the interval (a, b) there are points of 
continuity. 

If for any sufficiently small positive c 

^(xo + e) > ^(xo — e), 

Xb is called a “point of increase” of ^(x). There is at least one point of 
increase and there might be infinitely many. For instance, if 

856 
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^(x) 0 for a ^ X c 

^(x) = mo for c < X ^ b, 

then c is the only point of increase. On the other hand, for 

/ \ X — a 

V'W = 

every point of the interval (a, h) is a point of increase. In case of a 
finite number of points of increase the whole mass is concentrated in 
these points and the distribution function ip{x) is a step function with a 
finite number of steps. 

Stieltjes' integrals 

J^dipix) = mo, J^xd<p(x) = mi, • • * J^x*dif>{x) = m< 

represent respectively the whole mass mo and its moments about the 
origin of the order 1, 2, . . . i. When the distribution function ^(x) 
is given, moments mo, mi, ma, . . . roi (provided they exist) are deter¬ 
mined. If, however, these moments are given and are known to originate 
in a certain distribution of a mass over (a, 6), the question may be raised 
with what error the mass spread over an interval (o, x) can be determined 
by these data? In other words, given mo, mi, ma, . . . m,-, what are the 
precise upper and lower bounds of a mass spread over an interval (a, x) ? 
Such is the question raised by Tshebysheff in a short but important article 
“Sur les valeurs limites des int^grales^^ (1874).^ The results contained 
in this article, including very remarkable inequalities which indeed are of 
fundamental importance, are given without proof. The first proof of 
these results and the complete solution of the question raised by Tsheby¬ 
sheff was given by Markoff in his eminent thesis *‘On some applications 
of algebraic continued fractions'* (St. Petersburg, 1884), written in 
Russian and therefore comparatively little known. 

Suppose that p,- is the limit of the error with which we can evaluate the 
mass belonging to the interval (a, x) or, which is almost the same, the 
value of ^(x), when moments mo, mi, m 2 , . . . m< are given. If, with i 
tending to infinity, p* tends to 0 for any given x, then the distribution 
function ^(x) will be completely determined by giving all the moments 


mo, mi, mt, . . . . 

One case of this kind, that in which 


mo = 1, 


m2fc = 


1.3 • 5 • 


{2k - 1) 


2 * 


mifc+i = 0 


‘/our. LiotwiUe, Ser. 2, T. XIX, 1874, 
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was considered by Tshebysheff in a later paper, ‘‘Sur deux th^or^mes 
relatifs aux probabilit 4 s” ( 1887 )^ devoted to the application of his 
method to the proof of the limit theorem under certain rather general 
conditions. The success of this proof is due to the fact that moments, 
as given above, uniquely determine the normal distribution 

ip{x) = r 

of the mass 1 over the infinite interval (— «, +oo). 

After these preliminary remarks and before proceeding to an orderly 
exposition of the method of moments, it is advisable to devote a few pages 
to continued fractions associated with power series, for continued frac¬ 
tions are the natural tools in questions of the kind we shall consider. 

2 . Continued Fractions Associated with Power Series. Let 

♦(*) =^! + ^!+^!+• • • : 

be a power series arranged according to decreasing powers of z where the 
smallest exponent ai is positive. We consider this power series from a 
purely formal point of view merely as a means to form a sequence of 
rational fractions 

Ai, Ai 4- A*, Ai 4- A* 4. Ai, 

gai gat* gat gat * 


and we need not be concerned about its convergence. 

Evidently 1/^(2) can again be expanded into power series, arranged 
according to decreasing powers of 2. Let its integral part, containing 
non-negative powers of 2, be denoted by ^1(2), and let the fractional part 

Bi . Bt . Bi . 

2^1 "T" 2 /fi ‘ ‘ ’ 

containing negative powers of 2, be denoted by —01(2), so that 
^ - 9.(*) - 

In the same way 

1 

♦i(*) 

can be represented thus: 

= g.(*) - *.(*) 

^ Oeuvres completes de P. L. Tshebysheff, Tome 2, p. 482. 
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where 92 ( 2 ) is a polynomial and 

- i7. + i7. + F* ’ 

a power series containing only negative powers of z. Further, we shall 
have 


1 

<^ 2 ( 2 ) 


= g*(2) - <^ 3 ( 2 ) 


with a certain polynomial gt(z) and a power series 




containing negative powers of z, and so on. Thus we are led to consider a 
continued fraction (finite or infinite) 


( 1 ) 



associated with ^(z) in the sense that the formal expansion of 



2 


into a power series will reproduce exactly <>(z). The continued fraction 
( 1 ) is again considered from a purely formal standpoint as a mere abbre¬ 
viation of the sequence of its convergents 


^-1. 1 . = i 1 

Qi gi* Qg ~ Qs ^ * 

9, - - 

The polynomials 

Pi, P*, Ps, . . . 

Qi, O2, Qa, . . . 

can be found step by step by the recurrence relations 


( 2 ) 


Pi = giPi^i - Pi-ilj* 
Qi ” giQi—'k Q*—*/ 
Pi » 1, Po = 0 
Qi = Oo = 1 


= 2, 3, 4, . . . 
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from which the following identical relation follows: 
(3) Pi(z)Qi-^{z) - Qi{z)Pi^i(z) - 1, 


showing that all fractions 

Pi(z) 

Qi(z) 

are irreducible. Evidently degrees of consecutive denominators of 
convergents form an increasing sequence and the degree of Qi{z) is at 
least i. Since 



we can write 


Pti^i +l ~ ^t-nW) Pj-i 

Qi(Qi+\ — 4>i+l(z)) — Qi -1 


Qi+\ ~ ^i+liz) 

=z Pi^i+ljz) 

Qi-fi — 


<t>(z) = 


Pj+i - Pi<i>i+ i(z) 
Qi+i — Qi4>i^\lz) 


in the sense that the formal development of the right-hand member is 
identical with 4>(z). By virtue of relation (3) 


0(2) - 


Qi 


1 _ 

Qt(0*+i ~ 


The degree of Qi being X* and that of Qi+i being X»+i, the expansion of 


Qi(Q<+i — O»0»+i) 

in a series of descending powers of z begins with the power 
Hence, 


0 ( 2 ) - 


Pi 

Qi 


M 


+ • • • 


and, since 'Ki+i ^ X< + 1, the expansion of 

begins with a term of the order 2X< + 1 in 1/z at least. This property 
characterizes the convergents P</Q< completely. For let P/Q be a 
rational fraction whose denominator is of the nth degree and such that 
in the expansion of 
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the lowest term is of the order 2n + 1 in 1/z at least. Then P/Q coincides 
with one of the convergents to the continued fraction (1). Let t be 
determined by the condition 


Xt ^ n < \i^i. 

Then 

P,_M 

’ Qi~ + ■ ■ ■ 

♦(*) - Q = + • • • 

whence in the expansion of 

P _ P* 

Q Qi 

the lowest term will be of degree 2n -f 1 or X* -h Xt+i in 1/z. Hence, the 
degree of 

PQi ~ PiQ 

in z is not greater than both the numbers 

X, — n — 1 and n — X<+i 
which are both negative while 

PQi - PiQ 

is a polynomial. Hence, identically, 

PQi ~ = 0 


or 


P _P, 
Q 


which proves the statement. 


3. Continued Fraction Associated with 



Let ip(x) be a never 


decreasing function characterizing the distribution of a mass over an 
interval (a, 6). The moments of this distribution up to the moment of 
the order 2n are represented by integrals 


mo mi = f^xd^M, 


mo 


£x*dv{x). 


mj« 





862 


INTRODUCTION TO MATHEMATICAL PROBABILITY 


Let 

momims 
mimimt ; 

msm«m4 


Ao = wioi* Ai 


momil 

mim*| 


; At 



mowii • • • 

win 

mimt • • • 

wi»+i 

mAmn+i • 

• • mtn 


If has not less than n + 1 points of increase, we must have 
Ao > 0, Ai > 0, • . • A« > 0, 

and conversely, if these inequalities are satisfied, ^(x) has at least n + 1 
points of increase. To prove this, consider the quadratic form 

0 = + • » • + tnX*ydip{x) 

in n -f 1 variables fo, <i, . . . <». Evidently 

<t> =* (t, i = 0, 1, 2, . . . n) 

so that An is the determinant of ^ and Ao, Ai, . . . Aa..i its principal 
minors. The form cannot vanish unless fo = fi == • • • «= f* = 0. 
For if X = f is a point of increase and 0, we must have also 

+ • • • + tnX^ydip{x) == 0 

for an arbitrary positive €, whence by the mean value theorem 

(tp + <l«? + • • • + = 0(f - €<,<{ + «) 

or 

fo + fii? + • * • + tnrT “ 0 

because 

> 0 . 

Letting c converge to 0, we conclude 

<0 + fl « + • V • + fnf- = 0 

at any point of increase. Since there arc at least n -h 1 points of increase 
the equation 

<0 + fl® + • • • + fnX" = 0 

would have at least n + 1 roots and that necessitates 

(O S® <1 » • • • = L = 0. 
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Hence, the quadratic form which is never negative, can vanish 
only if all its variables vanish; that is, ^ is a definite positive form. Its 
determinant An and all its principal minors An-i, An_i, . . . Ao must be 
positive, which proves the first statement. 

Suppose the conditions 

Ao > 0, Ai > 0, . . . An > 0 

satisfied and let <p(x) have s < n -f 1 points of increase. Then the 
integral representing reduces to a finite sum 

^ = Pl(to + ilil + • • • + fnti)* + Pi(to + tlii + • • • + fnf*)* 4* 

-h • * • 4- p.(<o + + • • • 4 

denoting by pi, pj, . . . p, masses concentrated in the s points of 
increase fi, Ji, . . . Now, since s ^ n constants to, fi, . . . /n, not 
all zero, can be determined by the system of equations 

to + iiii 4- • • • + «n{? ='0 
to 4" hit 4" * * * + fnfl = 0 


to 4 tii$ 4 • . . 4 - tni'i = 0 - 


Thus 0 vanishes when not all variables vanish; hence, its determinant 
An = 0, contrary to hypothesis. 

From now on we shall assume that fp(x) has at least n 4 1 points of 
increase. The integral 



can be expanded into a formal power series of 1/z, thus 



mo , mi , mt , 

9 ' *1 * ' 

z z* z* 


• 4 


man 

2*n+l 


4 . . . 


and this power series can be converted into a continued fraction as 
explained in Sec. 2. Let 

Pi P, Pn 

Ql Q2 ' ' Q: Qn+l 

be the first n 4 1 convergents to that continued fraction. I say that the 
degrees of their denominators arc, respectively, 1, 2, 3, . . . n 4 L 
Since these degrees form an increasing sequence, it suffices to show that 
there exists a convergent with tlic denominator of a given degree 

« ^ n 4 1. 

This convergent P/Q is completely determined by the condition that in a 
formal expansion of the difference 
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P 

Q 


into a power series of l/z, terms involving I/ 2 , !/«*, . . . 1/z** are 
absent. This is the same as to say that in the expansion of 


there are no terms involving I/ 2 , I/ 2 *, . . . I/ 2 *. The preceding expres¬ 
sion can be written thus: 


j; 


Q{x)d<p{x) 

Z — X 


Since 


Ja 2 •*' 

f -Mx) - p{t) 

Ja Z - X 


A 


+ • • • . 


is a polynomial in 2 , it must vanish identically. That gives 

(4, P(„ _ 

To determine Q(z) we must express the conditions that in the expansion of 


£ 


Qix)Mx) 

Z — X 


terms in I/ 2 , !/«*, . . . 1/2* vanish. These conditions are equivalent to 
s relations 


(5) f'Qix)Mx) = 0, fyQ(x)Mx) = 0, • . . fy-^Q(x)dv(x) = 0, 
which in turn amount to the single requirement that 

(6) £e(.x)Q{x)dv{x) = 0 

for an arbitrary polynomial 0(x) of degree ^ — 1. 

Conversely, if there exists a polynomial Q(z) of degree 8 satisfying con¬ 
ditions (5), and P{z) is determined by equation (4), then P(z)/Q{z) is a 
convergent whose denominator is of degree «. For then the expansion of 

(" dvix) P{z) 

JaZ - X Q(Z) 

lacks the terms in 1/z, l/«’, . . . I/***. 
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Let 


Q(2) = io + ZiZ + ^225^ + • • • + + 2*. 

Then equations (5) become 

fnoA) “i" wiiZi 4" wijZa 4“ • • • “f" 4“ wi, = 0 

will) 4“ wiaZi 4" wiii2 4" • • • 4" wij«_i 4“ wia+i = 0 

wi»~il) 4" wiJi 4" wi^^-iZj 4* * • • 4" wi2«_2^«-i 4" wi2,_i == 0. 

This system of linear equations determines completely the coefficients 
Ui lit . l$~i since its determinant A,_i > 0. 

The existence of a convergent with the denominator of degree 

5 ^ n 4- 1 


being established, it follows that the denominator of the 5th convergent 
P$/Q$ is exactly of degree 5. The denominator Q, is determined, (except 
for a constant factor, and can be presented in the form: 



1 2 2* 

... 2* 

mo m)m2 

• • m. 

mi m%mz 

* • • m,+j 

m,_im,m.+i 

• ' • m2,- 


A remarkable result follows from ecpiation (0) by taking Q = Q, and 
0 = Q^; namely, 

(7) X*^*^*" ^ s 9^ s' 

while 


>0 (s ^ n). 

In the general relation 

Q» “ Q#—2 

the polynomial must be of the first degree 

q, = a.2 4- 3., 

which shows that the continued fraction associated with 
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has the form 


1 

ociZ + /3i *— 


1 

a^z + /8j — 


1 

OttZ + 38 — 


The next question is, how to determine the constants a« and Multi¬ 
plying both members of the equation 

Qt = + 3.)Q.-i - Q.-2 (s ^ 2) 

by Q»^ 2 d<p{z)f integrating between limits o and 5, and taking into account 
(7), we get 

0 = oit^^zQa^iQg^^dipiz) 

On the other hand, the highest terms in Qt-i and Q«-8 are 

aiaj • • * aiaj • • • 

Hence, 

+ * 

where ^ is a polynomial of degree ^5 — 2. Referring to equation (6), 
we have 

= T?— f Qi-idvie) 

and consequently 
,,, ^ 

Suppose that the following moments are given: ?no, mi, . . . m 2 «; how 
many of the coefficients a, can be found? Evidently ai = 1/mo. Fur¬ 
thermore, Qo = 1 and Qi is completely determined given mo and mi. 
Relation (8) determines a 2 , and Q 2 will be completely determined given 
mo, mi, m 2 , ma. The same relation again determines as, and Qs will be 
determined given mo, mi, . . . me. Proceeding in the same way, we 
conclude that, given mo, mi, m 2 , . . . msn, all the polynomials 

Qoi Qi, Qt, • • • Qn 

as well as constants 


oil, as, as, . . . an+x 
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can be determined. It is important to note that all these constants are 
positive. 

Proceeding in a similar manner, the following expression can be found 
= -ccAjr, - 

It follows that constants 


fil, ^2, . . . Pn 

are determined by our data, but not /Sn+i. For if s = n + 1, the integral 

f^Qldviz) 

can be expressed as a linear function of ?no, wii, . . . mjn+i with known 
coefficients. But m 2 «+i is not included among our data; hence, 
cannot be determined. 

4. Properties of Polynomials Q,. Theorem. Roots of the equation 
QM =0 (s ^ n) 

are reol^ simple^ and contained within the interval (a, h). 

Proof. Let Q»(z) change its sign r < s times when z passes through 
points 2 i, Zt, . . . Zr contained strictly within (a, 6). Setting 

e(z) = (z - Zi)(z - Z 2 ) • ' • (2 - Zr) 

the product 

does not change its sign when z increases from a to 6. However, 

fy{t)Q,{z)dip{z) = 0, 

and this necessitates that 

e{z)Q,{z) 

or Qt{z) vanishes in all points of increase of <p{z). But this is impossible, 
since by hypothesis there are at least n -f 1 points of increase, whereas 
the degree s of Q, does not exceed n. Consequently, Q%{z) changes its 
sign in the interval (a, h) exactly s times and has all its roots real, simple, 
and located within (a, 6). 

It follows from this theorem that the convergent 

Pn 

Qn 
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can be resolved into a sum of simple fractions as follows: 

(9) = -- Al— I — Al —. . . j — 

where 2 i, zj, . . . Zn are roots of the equation Qn{z) = 0 and in general 

A ^Pn(z,) 

The right member of (9) can be expanded into power series of 1/z, the 
coefficient of 1/z* being 

n 

a-1 

By the property of convergents we must have the following equations: 


= mo 

a-l 

n 

AaZa ~ mi 


2) = mon-1. 

a —1 

These equations can be condensed into one, 

fi 

(10) ^A.T(za) = fy(z)dv>(z) 

0 — 1 

which should hold for any polynomial T(z) of degree ^2n — 1. 
Let us take for T(z) a pol 3 momial of degree 2n — 2: 


T(Za) = 1, T(Zf) =0 if 
and consequently, by virtue of equation (10), 

'*■ - XT (^ -tilw > “■ 

Thus constants Ai, Aj, . . . An are all positive, which shows that Pn(zk) 
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has the same sign as Now in the sequence 

q:w, ... o;(o 

any two consecutive terms are of opposite signs. The same being true of 
the sequence 

PM PnM, . . . Pn(Zn), 

it follows that the roots of Pn(z) are all simple, real, and located in the 
intervals 

(Zif ^j), (^J, Zt)f . . . (Zn—1, 2n). 

Finally, we shall prove the following theorem: 

Theorem. For any real x 

Q;(x)Qn-i(x) ~ QLi(x)Qn(x) 

is a positive number. 

Proof. From the relations 

— Q,^i(z) 

Q.(x) = (a^ + /3.)Q.^i(x) - Q.-j(x) 

it follows that 

Z — X 

, 0t-l(z)0t->(x) 0«--l(x)Qa^t(z) 

Z — X 

whence, taking s = 1, 2, 3, . . . n and adding results, 

Q.{t)Qn-i{x) - Q.(x)Q.-i(z) ^ 

Z — X 

It suffices now to take z — x to arrive at the identity 

n 

q:(x)Q,_,(*) - QL.(*)Q.(*) = 

1-1 

Since Qo — 1 and a. > 0, it is evident that 

0;(x)Q.^,(x) - QLiix)Qn{x) > 0 

for every real x. 

6. Equivalent Point Distributions. If the whole mass can be con¬ 
centrated in a finite number of points so as to produce the same I first 
moments as a given distribution, we have an “equivalent point distribu- 
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tion*’ in respect to the I first moments. In what follows we shall suppose 
that the whole mass is spread over an infinite interval — oo, oo and that 
the given moments, originating in a distribution with at least n + 1 
points of increase, are 

wio, wi, m*, . . . mi». 

The question is: Is it possible to find an equivalent point distribution 
where the whole mass is concentrated in n + 1 points? Let the unknown 
points be 

and the masses concentrated in them 

•di, Atf . . . i4n+l« 

Evidently the question will be answered in the affirmative if the system 
of 2n + 1 equations 

n+l 


XAa 

= Wlo 

n+l 



=* mi 

a"l 

n+l 


a — l 

» mt 

n+l 


aa*l 

=* mtn 


can be satisfied by real numbers fi, fi, . . . {n+i; Ai, A 2 , . . . A^+i, 
the last n + l numbers being positive. The number of unknowns being 
greater by one unit than the number of equations, we can introduce the 
additional requirement that one of the numbers ( 1 ,( 1 , . . . in+i should 
be equal to a given real number v. The system (A) may be replaced by 
the single requirement that the equation 
n+l 

(11) XA„nu) = j\nx)dv(.x) 

a-l 

shall hold for any polynomial T{x) of degree g2n. Let Q(x) be the 
polynomial of degree n + l having roots { 1 , and let $ix) be 

an arbitrary pol 3 rnomial of degree sSn — 1. Then we can apply equation 
(11) to 


T(x) « e(x)Q(x). 
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Since 0(ta) = 0, we shall have 

( 12 ) f\B(x)Q{x)dvix) = 0 

for an arbitrary polynomial Six) of degree — 1. Presently we shall 
see that requirement (12) together with Q(v) = 0 determines Q(a;), save 
for a constant factor if 

Q«(») 9^ 0. 

Dividing Q{x) by Qn(x), we have identically 

Q{x) = (Xrc + ti)Qn(x) + jRn-iC®) 

where Rn^iix) is a polynomial of degree — 1. If 0{x) is an arbi¬ 
trary polynomial of degree — 2, 

(X^: + n)eix) 

$ 

will be of degree — 1. Hence 

£(\x + n)d(x)Qn{x)dip{x) = 0 
by (6), and (12) shows that 

jy{x)Rn-i{x)d<p{x) = 0 

for an arbitrary polynomial 0(x) of degree — 2. The last require¬ 
ment shows that Rn^\(,x) differs from Qn-i(x) by a constant factor. Since 
the highest coeflScient in Q(x) is arbitrary, we can set 

Rn-iix) = — On-l(x). 

In the equation 

Q(x) = (Xx + ii)Qnix) ~ Qn-i(x) 

it remains to determine constants X and m- Multiplying both members by 
Qn~i(x)d^(x) and integrating between — oo and «>, we get 

\JyxQn-iQnd<p(x) = fyQl-id<p{x) 
or 

Apoi.d^(*) = r"QjUtciv.(x). 

CCnJ-m 

But 

f_\Qldv(x) «• 
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whence 
The equation 

0 « Q{v) 


X = ttn+l. 

(ofn+li; 4* m)Q*(w) - Qn-M 


serves to determine n if Qn(v) ^ 0. The final expression of Q(x) will be 
Q(x) - (««+»(x - •;) + ^^)q.(x) - 0„_,(x). 


Owing to recurrence relations 
Qt * (ajx -h /5*)Qi — Oo; Qt = 

it is evident that 


(a*a: + — Qi; • • • 

Qn ~ (otn® “f* ^n)Qn—1 Qn—tf 


Qf Qny Qn— 1 ) • • . Ql, Qo — 1 


in a Sturm series. For a; = — «, it contains Jt + 1 variations and for 
X = 00 only permanences. It follows that the equation 

Q(x) = 0 


has exactly n + 1 distinct real roots and among them v. Thus, if the 
problem is solvable, the numbers {i, {j, . . . {«+i are determined as 
roots of 

QM = 0 . 


Furthermore, all unknowns Aa will be positive. In fact, from equation 
(11) it follows that 

“ J- i(® - fc)Q'(6.)] ^ 

Now we must show that constants Aa can actually be determined so as 
to satisfy equations (A). To this end let 

P(X) - = [«»+.(* - t') +^^]Pn(x) -P„_l(x). 


Then 

and, on accoimt of (12), the expansion of the right member into power 
series of l/x lacks the terms in l/x, 1/a;*, . . . 1/a;". Hence, the expan¬ 
sion of 

p d<p{z) P{x) 

J^^x — z Q(x) 
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lacks the terms in 1/a;, 1/a;^ . . . that is, 


P(x) ^ ^ , 

Q(x) a; 


, rriin , 

' 3.2n+l 


On the other hand, resolving in simple fractions. 


P(x) ^ Ai A 2 , , An+l 

Q{x) X - X - ^ 2 '^ ‘ X - ^n+i* 


Expanding the right member into power series of l/x and comparing 
with the preceding expansion, we obtain the system (A). By the previous 
remark all constants Aa are positive. Thus, there exists a point distribu¬ 
tion in which masses concentrated in n + 1 points produce moments 
TWo, mi, . . . m 2 n- One of these points v may be taken arbitrarily, with 
the condition 


Qniv) 9^ 0 

being observed, however. 

6. Tshebysheff’s Inequalities. In a note referred to in the introduc¬ 
tion Tshebysheff made known certain inequalities of the utmost impor¬ 
tance for the theory we are concerned with. The first very ingenious 
proof of them was given by Markoff in 1884 and, by a remarkable 
coincidence, the same proof was rediscovered almost at the same time 
by Stieltjes. A few years later, Stieltjes found another totally different 
proof; and it is this second proof that we shall follow. 

Let <p{x) be a distribution function of a mass spread over the interval 
— 00 , 00 . Supposing that a moment of the order i, 

= mi, 

exists, we shall show first that 

lim I'iniQ — = 0 

lim l**p{ — l) = 0 

when I tends to . For 

f’x'Mx) ^ vf'd^ix) = /‘[v(+=o) - vd)] 
or 

l‘{m„ - <p{T)) g J^"‘x'd(p{x). 

Similarly 

\f~lx*dv{x)\ g l*f~ldv{x) = lV(-0 
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or 

iV(-Z) ^ 

Now both integrals 

"^x^dipix) and ^a;*d^(x) 

converge to 0 as 2 tends to + « ; whence both statements follow immedi¬ 
ately. Integrating by parts, we have 

f^Mx) = livil) - two] - if'wix) - m]x*-'dx 
Jf^^x<d^(x) = (-1)‘-«JV(-I) - tj;”x‘-V(x)dx, 

whence, letting I converge to + «, 

rm * J* ■“ wiojx^^dx — tJ*°^x*"VWdx. 

If the same mass mo, with the same moment nUf is spread according to 
the law characterized by the function ^(x), we shall have 

tm =* J*J‘^x^d^(x) ~ ~ wio]x*“*dx — ij^ ^x*'‘hKx)dx, 

whence 

(13) - ^(x)]dx = 0. 

Suppose the moments 


mo, mi, m2, . . . m2„ 

of the distribution characterized by ^(x) are known. Provided ^(x) 
has at least n + 1 points of increase, there exists an equivalent point 
distribution, defined in Sec. 5 and characterized by the step function 
^(x) which can be defined as follows: 



0 

II 

for 

— 00 < X < (1 


fl>(x) = At 

for 

£1 ^ a? < £2 


^ Ai At 

for 

^ < {a 

^(x) ^ Ai + At + 

• ■ • + A, 

for 

{. S X < f»+J 

\^(x) ^ A\ + At + • 

• • + An+i 

for 

{.H-t ^ X < + oo 


provided roots (i, { 2 , . . • of the equation Q(x) = 0 are arranged 
in an increasing order of magnitude. 
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Equation (13) will hold for f = 1, 2, 3, . . . 2» or, which is the 
same, the equation 

(14) - \l^{x)]dx = 0 

will hold for an arbitrary polynomial 6(x) of degree ^2n — 1. The 
function 


h(x) = ip(x) — \l/{x) 

in general has ordinary discontinuities. We can prove now that h{x)j if 
not identically equal to 0 at all points of continuity, changes its sign at 
least 2n times. ^ Suppose, on the contrary, that it changes sign r < 2n 
times; namely, at the points 


U2, . . . Or- 

Taking 

dix) = (x — ai)(x — 02 ) • • • (x - Or), 
equation (14) will be satisfied, while the integrand 

e(x)h(x), 

if not 0, will be of the same sign, for example, positive. Let $ be any 
point of continuity of /i(x). If { = a* (f = 1, 2, . . . r) then h(ai) = 0 
since h(x) changes sign at a,. If f does not coincide with any one of the 
numbers ai, 02 , . . . a,, then for an arbitrarily small positive e we must 
have 


r*'e{x)h(x)dx = 0. 

Jk—* 

But by continuity 

B{x)h(x) 

remains in the interval ({ — €,{ + €) for sufficiently small e above a 
certain positive number unless /i({) = 0. Thus, if /i(x) docs not vanish 
at all points of continuity (in which case fp(x) and ^(x) do not differ 
essentially), it must change sign at least 2n times. Let us see now where 
the change of sign can occur. In the intervals 

“ 00 , and {„+i, +«> 

function f{x) is said to change sign once in (a, h) if in this interval there 
exists a point or points c such that, for instance, J{x) ^ 0 in (a, c) and f{x) ^ 0 in 
(c, 6), equality signs not holding throughout the respective intervals. The change 
of sign occurs n times if (a, b) can be divided in n intervals in which f{x) changes 
sign once. 
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ip(x) — if{x) evidently cannot change sign. Within each of the intervals 


{« 

there can be at most one change of sign, since remains constant 
there, and tpix) can only increase. The sign may change also at the 
points of discontinuity of that is, at the points fi, { 2 , . . . {n+i. 
Altogether, ip{x) — cannot change sign more than 2n + 1 times 
and not less than 2n times. 

Since ^(x) = 0 so far as x < and v>(£i — «) is not negative for 
positive €, we must have 

v>(fi - e) - ^({1 - ^ 0. 

Also ^(x) = mo for X > {«+i and *p{x) ^ mo, so that 

^(fn+l + c) “ ^(fn+l +0^0. 

At first let us suppose 

V>(£l “ «) - ^(£1 - 0 > 0, + €) - ^(£n+l + €)< 0. 

In this case ^(x) — ^(x) must change sign an odd number of times; that is, 
not less than 2n + 1 times. Since this cannot happen more than 2n + 1 
times, the number of times ^(x) — ^(x) changes its sign must be exactly 
2n + 1. These changes occur once within each interval 

£<-b £*' 

and in each of the points £ 1 , £ 2 , . . . £n 4 .i. When the change of sign 
occurs in the interval (£<-. 1 , {<) where ^(x) remains constant, because tp{x) 
never decreases, we must have for sufficiently small € 

(16) v>(£< ~ «) - ^(£< - €) > 0. 

But the sign changes in passing the point £<; therefore, 

(16) ^(£.- + €) - ^(£.- + 6) < 0. 

The equalities 

^(£1 — «) — ^(£1 — €) = 0, ^(£n+l + «) — ^(£n+l + €) = 0 

cannot both hold for all sufficiently small €. For then there would not 
be a change of sign at £1 and £»+i, so that the number of changes would 
not be greater than 2n — 1 which is impossible. Therefore, let 

^(£1 “"€)*" ^(£1 — «) == 0 and ^(£n+i + «) — ^(£n+i + «) <0. 

Then there will be exactly 2n changes of sign: one in each of the intervals 

£<-i» £• 
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and in each of the points { 2 , fi, . . . fn+i. The inequalities (15) and 
(16) would hold for i ^ 2, but 

^({1 — «) — ^(fi — €) =0, <p({i -f* 0 ^(ii -|“ €) < 0 

for all sufficiently small €. 

Now let 


V>(f»+i + e) — ^(tn+i + €) = 0 and ^(fi — c) — ^({1 — e) > 0 

for all sufficiently small positive €. Then there will be exactly 2n changes 
of sign: In each of the points { 1 , (t, . . . (n and in each of the n intervals 


The inequalities (15) and (16) will again hold for t ^ n, but 
<^(fn+l — c) — ^({n+l — €) > 0 and ^p(fn+l + «) "“ ^((n +1 + c) = 0 
for all sufficiently small c. Letting c converge to 0, we shall have 
^(fi - 0) 1 ^({, - 0) 

4- 0) ^ m + 0) 

for i = 1, 2, 3, . . . n + 1 in all cases. Then, since 

^((i) ^ ^((i - 0 ); ^ ^((i+0), 

we shall have also 


v>(f.) ^ ^(ii — 0) 

m + 0 ) 


or, taking into consideration the definition of the function ^(x) 




/-I 




1-1 


These are the inequalities to which Tshebysheff^s name is justly 
attached. For a particular root f* = v they can be written thus: 


v(t>) ^ 2 

«!<• 


pgi) 

Q'iid 


viv) ^ 2 

bS> 


PM 

Q'iii) 


(17) 
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with the evident meaning of the extent of summations. Another, less 
explicit, form of the same inequalities is 


(18) 


H>(p) s - 0) 

ip{v) ^ rUv + 0 ). 


As to P(x) and Q(x), they can be taken in the form: 

P(«) * Idn+lix - v)Q,(f;) + Q^iiv)]Pn{x) - Qn(v)Pn^lix) 

Q(x) * [an+iix - v)Qn(v) + Q—l(v)]0*(x) - Qn(v)Qn^l(x). 

Thus far we have assumed that v was different from any root of the 
equation 

Qn{x) = 0, 


but all the results hold, even if 

Qn(v) = 0. 


To prove this, we note first that when a variable v approaches a root { of 
Qm(x), one root of Q(x) (either or (n+i) tends to — « or + oo, while the 
remaining n roots approach the n roots Xi, xs, . . . x» of the equation 

Qn(x) - 0. 

If tends to negative infinity, it is easy to see that 

P((i) 


tends to 0. In this case the other quotients 

P(i») 

Q'((>) 

tend respectively to 

P,(x,) F.(xt) 

w'' ■ 


If (.^1 tends to positive infinity the quotients 
P({i). , _ 1 o 

Q'((,y ‘ " 

approach respectively 


V.(x.)’‘ 


1, 2, 3, . . . n, 


while 


P((n^l) 
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tends to 0. Now take » = { - e and«» = f + e in (17) and let the posi¬ 
tive number t converge to 0. Taking into account the preceding remarks, 
we find in the limit 


whence again 


v»(f - 0) 




XI <i 


«»({ + 0) 




^QLix,y 

list 




XI <t 


P.(x.) 

0;(x.) 




XI 


But these inequalities follow directly from (17) by taking v = (. 
Since 


Hv + 0) - ^(v - 0) = 

it follows from inequalities (18) that 

0 5 ^(V) - ^(V - 0) g 

On the other hand, one easily finds that 

P(v) _1_ 

Q'(v) an+iQM^ + QMQn-,(v) - QL,(v)Q,(vy 

But referring to the end of Sec. 4, 


Q'MQn-,(v) - QL,m.(v) = 

f-1 


whence 

anMvy + Q'n(v)Qn-l(v) - Q[,^MQn(v) = QUMQniv) - 

Finally, 


0 ^ (p{v) — ^(t; — 0) ^ 


1 


Qn^i(v)Qn(,v) - Qi(«')Q.+i(») 

If ^i(t;) is another distribution function with the same moments 


mo, mi, mj, . . . mi*, 
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we shall have also 


0 s ».w - «. - 0) S 1 

and as a consequence, 

(19) Iv’iCi') - (p(»)| g x»(t>) 

—a very important inequality. Here for brevity we use the notation 
“ Q;+i(»)Q.(«') - Q'nmnM 

7. Application to Normal Distribution. An important particular 
case is that of a normal distribution characterized by the function 


tp{x) = r e~^*du, 

V^J- -0 

In this case it is easy to give an explicit expression of the pol 3 momials 
Qn(x). Let 


Hn{x) = 


dx^ 


Integrating by parts, one can prove that for f ^ n — 1 


j n{x)dx = 0. 

Hence, one may conclude that Qn{x) differs from Unix) by a constant 
factor. Let 

Qn{x) = CnHn{x). 


To determine Cn, we may use the relation 

Hn{x) = -2xHn-x{x) - 2(n - l)Hn- 2 W 

which can readily be established. Introducing polynomials Qn, this 
relation becomes 


Hence, 


<3»(x) = - 2(n - l)-^(2„_,(x). 

Cn-l Cn-2 


ttn = —2--^; j8n = 0. 

Cn-1 


Cn -2 2n — 2 
Since Ho(x) = Qo(x) = 1, we have co = 1; also 


oil 


tllo 


1 —^ 
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whence Ci = — The knowledge of co and Ci together with the relation 


■ 2n - 2 

allows determination of all members of the sequence C 2 , Cs, C 4 , . . . . 
The final expressions are as follows; 

1 

Cj- 2’" • 1 • 3 • 5 • • • (2m - 1) 

-1 

C 2 m +1 2-+1 • 2 • 4 • 6 • • • 2m’ 

From the above relation between Hn{x)y Hn~i(x) and owing to 

the fact that lln{x) is an even or odd polynomial, according as n is even or 
odd, one finds 

/f2m(0) = (-2)- • 1 • 3 • 5 • • • (2m - 1), 
while another relation 

K{x) = -2nHn^i(x), 

following from the definition of Hn{x)y gives 

HL-iW = (“2)- • 1 • 3 • 5 • • • (2m - 1). 

These preliminaries being established, we shall prove now that 

attains its maximum for v — 0. Let 


m = HUMHJv) - II'„(,v)Hn+,{v). 


Then, taking into account the differential equation for polynomials 
ff"(r) = 2vH'M - 2nHn{v) 

we find that 


dv 


= 2vQ - 2IIn(v)Hn+l(.v). 


On the other hand, 
and denoting roots of the polynomial in general by 


o _ II 

a - 


d H.(v) _ 1 

dv (v - ir 
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Consequently 


0 - _ J),- 




and so 


- 2//»+,(t>) ^ ^(„_ 


{V - {)» 


Roots of the polynomial Hn+i(x) being 83 rmmetrically located with 
respect to 0, we have: 


'(» - 0’ 


l(f+ {)* 




and finally 


Hence 


= ->^’2 '■ 


dv n + 1 - {*)» 


^>0 if .<0; 


<0 if V > 0 


that is, Q(v) attains its maximum for i; = 0 and Xn(,v) attains its maximum 
for t; = 0. Referring to the above expressions of C 2 m; c^m+u ^f2w(0), 
//im+i(0), we find that 


/A^ _ 2 • 4 • 6 • • • 2m 

Xjmiw; 3.6.7 . . . (2»t + 1) 

_ 2 ■ 4 • 6 • • ■ 2ot 

Xs»+iW - 3.5.7 . . . (2 to + !)■ 

In Appendix I, page 354, we find the inequality 


whence 


2 • 4 • 6 * * • 2m _ 1 ^ yir 

1 • 3 • 5 • • • (2m - 1) V4m + 2 2 

2.4.0 ... 2m j T 

3 • 5 • 7 • • • (2m + 1) ^ \4m + 2 


Xn(v) ^ Xn(0) < 


Thus, in all cases 
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whence, by virtue of inequality (19), 

|#>l(») - <!>{»)| < 

Thus any distribution function ^i(v) with the moments 

. A 1 • 3 • 5 • • (2fc - 1) ^ , 

mo = 1, m 2 Jb-i = 0, majk =- ^ - (k ^ n) 


corresponding to 


<p(v) 

differs from ^(v) by less than 



Since this quantity tends to 0 when n increases indefinitely, we have the 
following theorem proved for the first time by Tshebysheff: 

The system of infinitely many equations 



j x^H<e{x) = 0; 
fc = 1, 2, 3, . 


J* x^d<p[x) = 

_ 1 • 3 • 5 • • • (2fc - 1) 
2 * 


uniquely determines a never decreasing function if>[x) such that ) 

namely^ 


<p{x) = 



e~^*du. 


= 0 ; 


8. Tshebysheff-Markoff’s Fundamental Theorem. When a mass = 1 
is distributed according to the law characterized by a function F(x, X) 
depending upon a parameter X, we say that the distribution is variable. 
Notwithstanding the variability of distribution, it may happen that its 
moments remain constant. If they are equal to moments of normal 
distribution with density 

flU 

Vi 


then by the preceding theorem we have rigorously 


F{x, X) = -4= r* e-'du 

v»J- - 


no matter what X is. 
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Generally moments of a variable distribution are themselves variable. 
Suppose that each one of them, when X tends to a certain limit (for 
instance oo), tends to the corresponding moment of normal distribution. 
One can foresee that under such circumstances F(x, X) will tend to 

(p(x) = -^ 7 = r e^^^du. 

VirJ-- 

In fact, the following fundamental theorem holds: 

Fundamental Theorem. //, for a variable distribiUion characterized 
by the function F(Xy X), 


lim I x^dF(x, X) = (* e~~**x^dx; X — > oo 

J-- VirJ_- 

for any fixed A; = 0, 1, 2, 3, . . . , then 

lim F(Vf X) = r e~**dx; X —♦ « 

VirJ- • 

uniformly in v. 

Proof. Let 


mo, mi, m2, . . . m2» 

be 2n + 1 moments corresponding to a normal distribution. They 
allow formation of the polynomials 

Qo(x), Qi(z), . . . Qn(x) and Q(x) 

and the function designated in Sec. 6 by ^(x). Similar entities cor¬ 
responding to the variable distribution will be specified by an asterisk. 
Since 


m* m* as X —> «> 
and since An > 0, we shall have 

a;>o 

for sufficiently large X. Then F(x, X) will have not less than n -f 1 
points of increase and the whole theory can be applied to variable dis¬ 
tribution. In particular, we shall have 

0 ^ ip(v) ~ “ 0) ^ Xn(v) 

( 20 ) 

0 ^ F(v, X) — ^*(v — 0) ^ xS(v)- 

Now 0!(x)(s = 0, 1, 2, ... n) and Q*(x) depend rationally upon 
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m{(fc=s 0, 1, 2, . . . 2n); hence, without any difficulty one can see that 


QtWs=o, 1, 2, . 
0*(*)^Q(x) 


as X —> «; whence, 
Again 


xi'(«) ->■ x»(»). 
^*{v - 0) ^|>{v - 0) 


as X — > 00 . A few explanations are necessary to prove this. At first let 
Qniv) ^ 0. Then the polynomial Q{x) will have n + 1 roots 


fi < fa < fa < * • * < fn+i* 


Since the roots of an algebraic equation vary continuously with its 
coefficients, it is evident that for sufficiently large X the equation 

Q*(x) = 0 

will have n -f 1 roots: 


f*i < fa < f? < • * • < f;^i 

and f { will tend to f* as X — > oo. In this case, it is evident that — 0) 
will tend to — 0). If Qn(v) = 0, it may happen that f ^ or fJ^^j tends 
respectively to — « or 4-°° asX—>oo, while the other roots tend to the 
roots 


Xi, Xa, . . . Xn 


of the equation 

Qn(x) = 0. 

But the terms in ^*(t; — 0) corresponding to infinitely increasing roots 
tend to 0, and again 

4^*(v - 0) Hv - 0). 


Now 


< ylk- 

Consequently, given an arbitrary positive number €, we can select n so 
large as to have 
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Having selected n in this manner, we shall keep it fixed. Then by the 
preceding remarks a number L can be found so that 


|^(f; — 0) — -- 0)1 < € 

for X > L. Combining this with inequalities (20), we find 
\F(v, X) - v>(v)l < 3c 

for X > L. And this proves the convergence of F(Vf X) to <p(p) for a 
fixed arbitrary v. To show that the equation 


lim F(Vy X) 



e~**dx 


holds uniformly for a variable v we can follow a very simple reasoning due 
to P6lya. Since ^(—«) =0, ^(+<») =1 and ip(x) is an increasing 
function, one can determine two numbers Oo and an so that 


<p(.x) ^ vioa) < 1 

for X ^ Oo 

1 - vix) ^ 1 - <p(an) < 1 

for X ^ On. 


Next, because ip{x) is a continuous function, the interval (oo, Un) can be 
subdivided into partial intervals by inserting between oo and Un points 
Ui < Oj < • • • < a»-_i so that 


0 < <p(ak+i) — ip{ak) < g 

for fc = 0, 1, 2, . . . n — 1. By the preceding result, for all sufficiently 
large X 

F(a,, X)< 1 - F(a„ X)< | 

and 


lF(o*, X) - «.(a*)| < i; fc = 1, 2, . . . n - 1. 
Now consider the interval (— oo, oo). Here for i; ^ oo 
0 g Fiv, X)< 0 < v(v) < i 


and 


1F(», X) — »»(t»)l < «. 
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For V belonging to the interval (a*, -f «) 

0^1- F{v, X)< i 0 < 1 - v»(«>) < g. 


whence again 


1 F (», X ) - ^(»)1 < t . 


Finally, let 

ak ^ V < Ck+i (fc = 0 , 1 , 2 , . . . n “ 1 ). 


Then 


F(Vf X) — <r{v) ^ F{ak, X) — ^(ajfc+i) == 

= lF(afc, X) — + M<^k) — ^(ait+i)] 

F(v, X) “ ip(v) ^ F{ak+u X) — v>(a*) = 

= [F(aik-|.i, X) — (p{ak+\)] + [^(a*+i) — 

But 


whence 


F(afc, X) - (p{ak) > - - ^(flfc+i) > -5 

F(aib+i, X) — <p(ak+i) < ~ ^(a*) < 


— 6 < F(v, X) — < «• 


Thus, given €, there exists a number L(e) depending upon € alone and 
such that 

\F(v, X) - < c 

for X > L(€) no matter what value is attributed to v. 

The fundamental theorem with reference to probability can be stated 
as follows: 

Let 8n he a stochastic variable depending upon a variable positive integer 
n. If the maJthemaiical expectation E{sf) for any fixed fc = 1 , 2, 3, . . . 
tendSj as n increases indefinitely^ to the corresponding expectation 


£(x*) = 1 r ■’ x*e-**da: 

VtJ- - 

of a normally distributed variable^ then the probability of the inequality 


tends to the limit 


8n < V 


and that uniformly in v. 


r er**dx 
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In very many cases it is much easier to make sure that the conditions 
of this theorem are fulfilled and then, in one stroke, to pass to the limit 
theorem for probability, than to attack the problem directly. 


Application to Sums op Independent Variables 

9. Let 2 i, ^ 2 , 28, . . . be independent variables whose number can be 
increased indefinitely. Without losing anything in generality, we may 
suppose from the beginning 

E{zk) =0; fc = 1, 2, 3, . . . . 

We assume the existence of 

E{zl) = hk 

for all fc = 1, 2, 3, ... . Also, we assume for some positive h the 
existence of absolute moments 


E\z,r^ ^ ^ = 1,2,3,- 

Liapounoff’s theorem, with which we dealt at length in Chap. XIV, 
states that the probability of the inequality 


where 


2l + 22 + * ’ ‘ + ^ ^ 

Bn = 6l + 62 + * * • + &n 


tends uniformly to the limit 

as 71 00 , provided 


Liapounoff^s result in regard to generality of conditions surpassed by 
far what had been established before by Tshebysheff and Markoff, whose 
proofs were based on the fundamental result derived in the preceding sec¬ 
tion. Since Liapounoff’s conditions do not require the existence of 
moments in an infinite number, it seemed that the method of moments 
was not powerful enough to establish the limit theorem in such a general 
form. Nevertheless, by resorting to an ingenious artifice, of which we 
made use in Chap. X, Sec. 8, Markoff finally succeeded in proving the 
limit theorem by the method of moments to the same degree of generality 
as did Liapounoff. 
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Markoff’s artifice consists in associating with the variable Zk two new 
variables Xk and yk defined as follows: 

Let AT be a positive number which in the course of proof will be 
selected so as to tend to infinity together with n. Then 

Xk = Zk, i/fc = 0 if \zk\ ^ N 

Xk =0, y* = Zk if \zk\ > N. 

Evidently Zk, Xk, yk are connected by the relation 



*» = X* + Vk 

whence 


(21) 

E(xk) + Eii/k) = 0. 

Moreover 

Eixl) + Eiy}) = E{zl) = hk 

(22) 



+ ^|y*|*+* = E\zk\^^^ = 

as one can see immediately from the definition of x* and y*. 

Since x* is bounded, mathematical expectations 

B(xl) 

exist for all integer exponents Z — 1,2,3, . . . and for A; = 1,2,3. 

In the following we shall use the notations 

lE(xi)| =4*>; Z = 1,2,3 - 

c(*) 4. -h • • • + r? = B: 

^(2+3) _|. ^(2+3) . . . + == Cn. 

Not to obscure the essential steps of the reasoning wc shall first 
establish a few preliminary results. 

Lemma 1. Let qk represent the probability that yk 7 ^ 0 \ then 

Cn 

+ 92 + * * * + ^ 

Proof. Let ipk(x) be the distribution function of 2 *. Since yk 7 ^ 0 
only if \zk\ > N, the probability qk is not greater than 

J_ ^dipkix) + dipk{x). 

On the other hand, 

^ Mr”. 
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But 

whence 


qk ^ 


J* d<pk{x) + dipk(x) ^ 


N^>' 


The inequality to be proved follows immediately. 
Lemma 2. The following inequality holds: 


1 


-Bn- 


Proof. From 


E\yk\^* ^ 


which is a consequence of the second equation (22) it follows that 


E{yi) ^ 


^1^2+5) 

~W' 


The first equation (22) 


gives 


ci*> + Eiyi) = h 


hk ^ ^ bk — 




Taking the sum for A; = 1, 2, 3, . . . n, we get 

Bn^B'n^Bn- 


whence 


1 2; ^ ^ 1_.. 

^-Bn^ BnN> 


Lemma 3. For e ^ 3, 

cy + 4*’ + 


+ • • • + 

Bl " 


«Z2 
2 • 


Proof. This inequality follows immediately from the evident 
inequalities 


4*' g g N*-m{xi) g Ar--*6ifc. 
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Lemma 4. The following inequality holds 

C<« + c<,« + • • • + C.» ^ / C. 

Bj = • 

Proof. Since 

E{xk) E{yk) = 0 , 

we have 

= \E(xk)\ = \E(yk)\ ^ E\ykl 
On the other hand, by virtue of Schwarz^s inequality 
[E\y\\ -f- E\y 2 \ 4" ‘ + E\ynW = 

n 

^ (9i + ^2 + • ‘ + qn)'^^E{yl) ^ 

k~\ 

whence the statement follows immediately. ' 

If the variable integer N should be subject to the requirements that 
both the ratios 

Cn , 

and g- 


should tend to 0 when n increases indefinitely, then the preceding lemmas 
would give three important corollaries. But before stating these 
corollaries we must ascertain the possibility of selecting N as required. 
It suffices to take 


Then 


N = + 



by virtue of Liapounoff’s condition. 

Also 

BJl* 

will tend to 0. By selecting N in this manner wc can state the following 
corollaries: 

Corollary 1. The sum 



4 " 5'2 4 - * * * + 


tends to 0 as n oo. 
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Corollary 2. The ratio 


tends to 1. 

Corollary 3. The ratio 


Bn 


cy> + cy> + • • • + <4“ 

b\ 

tends to 0 for cdl positive integer exponents e except 6 = 2. 

10. Let Fn(0 and 0n(O represent, respectively, the probabilities of the 
inequalities 


+ ^2 -h 




Xi X2 * * * + X, 


< t 

< L 


V^n 

By repeating the reasoning developed in Chap. X, Sec. 8, we find that 

|^n(0 — </>ii(0l ^ + 98 + * * * + 9n. 

Hence, 

lim {Fn(t) — 0n(O) =0 as n —♦ « 
by Corollary 1. It suffices therefore to show 


0n(O 


•u. 


e~*'dx as n —» «, 


and that can be done by the method of moments. By the pol 3 momial 
theorem 


Sa,fi, ... X 

m m 

2 ^Bn^ 


( Xi X2 + xA”* _ ml 

V2b; ) 2jam • • 


X! 


where the summation extends over all systems of positive integers 
a ^ jS ^ ^ X satisfying the condition 


a-j“/34" * ’ * -|“X=wi 


and Sa. 0 . ... X denotes a symmetrical function of letters Xi, Xi, . . . Xn 
determined by one of its terms 

xfxj . . . x^ 




APPENDIX IJ 


393 


if I represents the number of integers a, j3, . . . X. Since variables 
Xi, « 2 , . • . Xn are independent, we have 

p/ Xl 4- 3^2 H - • • • 4- ^ _ m\ _ Gap. .. . X 

V vm ) 

where Ga,fi. . . . x is obtained by replacing powers of variables by mathe¬ 
matical expectations of these powers. It is almost evident that 

\Ga.p. . . .Xl ^ C<i“> + Cr + • • • Cf + Cf -f • • • + 

X 

Br? 

Now if not all the exponents a, ^3, ... X are = 2 (which is possible 
only when m is even), by virtue of Corollary 3 the right member as well as 

Gg.a . . . ■ X 

m 

Bj 

tends to 0. Hence 


Ei 


'X\ + *2 + ■ ■ ■ + 




r - 


if m is odd. 

But for even m we have 


(23) E\ 


f 


+ *2 + • • • + a: 


V2B« 

Let us consider now (m being even) 

_ /c‘i« + c<,«> + • • • + c; 


A" _ * 

J “2^ 


0 . 




©* = ( 


Bn 


n’A* — _ L Jf 

) ~ • • • w! 


where summation extends over all systems of positive integers 
X ^ M ^ ‘ ’ = w 


\ 1 _L_ _! 

X-hM4-' *+w=-2 


satisfying the condition 
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and H\,n, ... „ is a symmetric function of . . . ci*' determined by 

its term 

... (ci«)«, 

I being the number of subscripts X, /x, • • • "• Apparently 


m “ 

B.* 

Besides 


m + (ci»)^ + • • • + 
Bi 


(4*)" + ‘ + (4”)“ 

b: 




(4*)* + (4*)* + 
Bi 


(4*)‘ ^ 


“ \B»/ 


if « > 1. Thus 


Hx......, 


if not all subscripts n, . . . <a are equal to 1 . It follows that 


<b;\* . . . . 


(!!)■-(?)' 


But by Corollary 2 


and evidently . . . i = 6 ^ 2 , 2 , ... 2 . Hence 
/m\^ G 2 ,g, ... 2 ^ 


and this in connection with (23) shows that for an even m 


f X\ + xt Xn 


K?> 


Finally, no matter whether the exponent m is odd or even, we have 


f X\ X% X\ 

I V2b; 




x^er*'dx. 
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Tshebysheff-Markoff’s fundamental theorem can be applied directly 
and leads to the result: 


uniformly in t. 
uniformly in i. 


lim 4>n{t) = 


-u. 


e-^'dx 


On the other hand, as has been established before, 
lim [Fn(0 — 4>n(01 = 0 
Hence, finally 


lim = ~j‘ 


e~*'dx 


uniformly in t. 

And this is the fundamental limit theorem with LiapounofT’s condi¬ 
tions now proved by the method of moments. This proof, due to 
Markoff, is simple enough and of high elegance. However, preliminary 
considerations which underlie the proof of the fundamental theorem, 
though simple and elegant also, are rather long. Nevertheless, we must 
bear in mind that they are not only useful in connection with the theory 
of probability, but they have great importance in other fields of analysis. 
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ON A GAUSSIAN PROBLEM 


1. In a letter to Laplace dated January 30, 1812,^ Gauss mentions a 
diflftcult problem in probability for which he could not find a perfectly 
satisfactory solution. We quote from his letter: 

Je me rappelle pourtant d’un probl^me curieux duquel je me suis occup6 il y 
a 12 ans, mais lequel je n’ai pas r4ussi alors k r4soudre & ma satisfaction. Peut- 
6tre daignerez-vous en occuper quelques moments: dans ce cas je suis sur que vous 
trouverez une solution plus complete. La voici: Soit M une quantity inconnue 
entre les limites 0 et 1 pour laquelle toutes les valeurs sont ou ^galement probables 
ou plus ou moins selon une loi donn4e: qu*on la suppose convertie en une fraction 
continue 



+ • 


Quelle est la probabilit4 qu’en s’arr^tant dans le d4veloppement k un terme fini 
la fraction suivante 


1 

4 - 


1 

^(n+2) -j- . 


soit entre les limites 0 et x? Je la designe par P(n, x) et j'ai en supposant toutes 
les valeurs 4galement probables 

P(0, x) = X. 


P(l, x) est une function transcendante dependant de la function 


que Euler nomme inexplicable et sur laquelle je viens de donner plusieurs re- 
cherches dans un memoire presente k notre Societe des Sciences qui sera bient6t 
imprime. Mais pour le cas ou n est plus grand, la valeur exacte de P(n, x) semble 
intraitable. Cependant j'ai trouve par des raisonnements trds simples que pour 
n infinie 


P(n, X) 


log (1 -h x) 
log 2 


»Gauss' Werke, X, 1, p. 371. 
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Mais les efforts que j’ai fait lors de mes recherches pour assigner 


P(n, x) — 


log (1 -h x) 
log 2 


pour une valeur tr^s grande de n, mais pas infinie, ont infructueux. 

The problem itself and the main difficulty in its solution are clearly 
indicated in this passage. The problem is difficult indeed, and no 
satisfactory solution was offered before 1928, when Professor R. O. 
Kuzmin succeeded in solving it in a very remarkable and elegant way. 

2. Analytical Expression for Pn(x). We shall use the notation 
Pn{x) for the probability which Gauss designated by P(n, x). The first 
question that presents itself is how to express Pn(x) in a proper analytical 
form. Let 5(t;i, Vj, . . . Vn, x) be an interval whose end points are 
represented by two continued fractions: 


•’'+.V 


and 


1 


Vn -h X 


+ + 



with positive integer incomplete quotients Vi^ Va, . . . while x is a 
positive number ^1. Two such intervals corresponding to two different 
systems of integers Vi, Va, . . . Vn and vj, . . . do not overlap; 
that is, do not have common inner points. For, if they had a common 
inner point represented by an irrational number N (which we can always 
suppose), we should have for some positive x' < 1 and x" < 1 






+ 


X' 


+ 


, + x" 


But that is impoasible unless v[ = ri, = Va, . . . vi = t;„. 

A number M being selected at random between 0 and 1 and converted 
into a continued fraction 


jf = - . 1 


• + 


_i_ 

+ £ 


if the quantity ( turns out to be contained between 0 and x < 1, Af must 
belong to one (and only one) of the intervals 5(yi, Va, . . . Vn, x) cor¬ 
responding to one of all the possible systems of n positive integers 
Vi, V 2 , . . • Vn. Since M has a uniform distribution of probability and 
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since the length of the interval vi, . . . x) is 
1 1 


(-1V 


'‘ + S + 


’'+r,+ 


+ : 




the required probability P»(*) will be expressed by the sum 

p,{x) = 


J. 

+ J- 

Vi -|“ 


4- 


1 


J. 

vi 4-_1 
Vi 4- 


Vn-\- X 


4-: 


Vnl 


extended over all systems of positive integers Vi, rj, . . . Vn- In general 
let 

Pi ^ . 


+ S' 4. 

Vi -h 


(i = 1, 2, . . . n) 




be a convergent to the continued fraction 

M,.. 


Then the above expression for Pn{x) can be exhibited in a more convenient 
form: 


( 1 ) 


p.ix) = 2 

Vl.fl, . . 


(- 1 ) 


P„ + xPn-l 
Qn 4 - xQn-l 


Qn 




By the very definition of Pn(x) we must have Pn(l) = 1; hence the 
imp>ortant relation 


2q,(q„ + o,_.) - 

This result can also be established directly by resorting to the original 
expression of Pn(l) and performing summation first with respect to Vi, 
then with respect to Vi, etc. 

Relation (2) can be interpreted as follows: Ijet 5 in general be the 
length of an interval 3(vi, ^ 2 , . . . Vn, 1). Then 

25 = 1 

summation being extended o^^^er the (enumerable) set of intervals 5. 
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3. The Derivative of P«(x). In attempting to show that Pn{x) 
tends uniformly to a limit function as n oo it is easier to begin with its 
derivative Pnix), Series 


+a:Q«-i)* 


obtained by formal derivation of (1) is uniformly convergent in the 
interval (0, 1). For 


Qn > 


Qn + Qn-1 
2 


whence 


and the series 


1 


(Qn + xQn-^y Qn(Qn + Qn-l) 


^QniQn -f- Qn-l) 


= 2 


is convergent. Hence 

dPnjx) 

dx 

Since 


?»(*) - 2(0. + xQ._,) 


Qn “■ VnQn—\ “H Qn—2 


wc have 


p»(^:) = 2 


' (Vn + X)* 


..+ 

and, performing summation with respect to Vu vtj , . . Vn-i for constant 

Vn 

2 7~ 1 y ^ 

. 


p»(x) - 2 p"-'(«. + x)(i;„ + ly 


whence 
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or else 

« 

'3) p-(*) = + *)» 

• — 1 

—an important recurrence relation which permits determining com¬ 
pletely the sequence of functions 

PiW, p»(*). • • • 

starting with po(a;) = 1. 

4. Discussion of a More General Recurrence Relation. In discussing 
relation (3) the fact that Vq{x) = 1 is of no consequence. We may start 
with any function /o(x) subject to some natural limitations, and form a 
sequence 

fi(x), ft(x), . . . 

by means of the recurrence relation 

(.) /.W . 

* • i 

The following properties of /«(x) follow easily from this relation: 
a. If 

m 

then 

/«(*) = « = 1, 2, 3, . . . 

For 



m 

l+x 


^Ux) 




M 

l+x 
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Follows from (a) and equation (4) itself. 

As a corollary we have: Let Mn and rrin be the precise upper and 
lower bounds of 

(1 + x)Ux) (n = 0, 1, 2, . . . ) 

in the interval 0 ^ x ^ 1. Then 

Mo ^ Ml ^ Mi 

mo ^ mi ^ m2 ^ • • • 

c. We have 




dx 


{v + xy 

= Ji = l^-^ix)dx = jj,(x)dx. 


d. The following relations can easily be established by mathematical 
induction: 


Ux) = 
fuix) = 
Mx) = 


2 ./Pn+XP,-l\ 1 

2 ./P„+xP,-i\ 1 

^\Qn + xQ,-.AQ, + xQn-.)» 

{Pn+xPn~l\ 1 

+ xq^iAq. + xe«-i)*’ 


Let us suppose now that the function /o(x) defined in the interval 

0 g X g 1 

possesses a derivative everywhere in this interval and let /to be an upper 
bound of |/5(x)| while M is an upper bound of 1(1 4- x)/o(x)|. Then by 
property (6) 

|/.(x)| ^ M; |/,.(x)| g M; |/,.(x)| ^ M, - 

The function /»(x) represented by the series 

/n(x) = + xQ.-.)» 

where u stands for 


Pn -f xPn-1 
Qn d" xQn^l 
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has s derivative; for the series obtained by a formal differentiation 

is uniformly convergent and represents Now 


< 3 . 


<i. 


and 


Hence 


(Qn + xQn-O* ^ Ql 

Qf ^ Qn(Qn 4“ Qn—l) 


52^o(u) = 


Qn—1 


4 m 2 


1 


QniQn 4" Qn-l) 


= 4ilf 


"(Qn + xQn-l) 

by virtue of (2). On the other hand, the inequality 

Qn(Qn + Qn-l) = (VnQn-l + Qn-2)[(t^n + l)Qn-l + Qn- 2 ] > 

> 2Qn-l(Qn-l 4* Qn-t) 

holding for n ^ 2 together with an evident inequality 

Qi(Qi + Qo) ^ 2 

shows that 

Qn(Qn + Qn-l) >2** (w ^ 2). 

Thus 

(0, + xQ^,)« >QI QI> > 

> 2«-»Q,(Qn + Qn-i) 

and consequently 


2/j(w) 


(-1)" 


(Qn + xQ._.)*| 


Mo 

2»-2' 


Hence, we may conclude that 


Ml = ^ + 4M 


is an upper bound of |/i(a:)|. Similarly, starting with the second equation 
in (d), we find that 
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is an upper bound of |/i„(x)|, and so forth. In general, the recurrence 
relation 


« = pEi + 4Af (fc = 1, 2, 3, 

determines upper bounds of 

|/;(x)|, WnWI, |/J.(x)|, . . . 

It is easy to see that in general 

^ Mo , 4Af 

2kin-t) ^ I _ 2 -(»- 2 ) 

SO that for sufficiently large n 


Mfc < . 


6. Main Inequalities. Let 


• • ) 


^(X) =/o(x) - 

Then 


/»w - = v'-w = 

Since the intervals 5 defined at the end of Sec. 2 do not overlap and cover 
completely the whole interval (0, 1), we may write: 

I = ^ Mx)dx = 

the latter part following from the mean value theorem and ui being a 
number contained within the interval 6. By subtraction we find 


J> 521 v>o(«) + q,_,) 


and, since both u and Ui belong to the same interval 5, 


^o(u) — (p(Ui) > — 


Mo + ^ 
Qn{Qn + Qn-l) 


> - 


Mo + Wlo 
2» 


Ux) - 


Tito 

1 + X 


- I > 


/to + Wlo 
2n+l ' 


Consequently, 
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and a fortiori 

r ^ ^ "i" ^ 2““(/io 4- nUi) 

Jn{x) > ^ ^ 

It follows that 

(5) mi ^ mo + I — 2“*(/io 4- mo).* 

In a similar way, considering the function 

Mx) = -/<•(*) 

and setting 

h = hf^Ux)dx, 

we shall have 


/ /^\ ^ -^0 — li 4- 2 "(/lo 4" Afo) 

InW < 1 -I- a: ' 

whence 

(6) il/i ^ A/o - Zi 4- 2 -"(mo + Afo). 

Further, from (6) and (6) 

Ml — mi ^ A/o — ^no 4" 2~*"*'*(/io 4" Afo) — 1 — Ii. 

But 

Z 4- i log 2 • (Afo — ?no) = (1 — fc)(Afo — m©); fc < 0.66, 
so that finally 

Afi — mi < A;(Afo — mo) 4- 2 -"+Hmo + Afo). 

Starting with /n(a;), /inW, . . . instead of /o(x), in a similar way we find 

Afs — m* < k{Mi — mi) + 2 “"‘^*(mi 4- Afi) 

Af, - m, < k{Mt - ma) + 2 -«+Km 2 + Af,) 

Mn — m» < k{Mn-\ — m«-l) 4" 2~"'*’'(>in~l + Afn-l). 

From these inequalities it follows that 

Mn — mn < (Afo - mo)A;* 4“ 2-"’*-' 4- 4- • ‘ * 4- Mn-i 4- 

4- AfoA;»-i 4- Af lA;-* 4- * * * 4- Afn-i]. 

Without losing anything in generality, we may suppose that /o(x) is a 
positive function. Then 

*Mit mi are used here with the same meaning as Mnit mni in Sec. 4. 
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Mk i Mt, nk < 5Mo (k ~ 1,2,3, ... ) 

at least for sufficiently large n. Owing to these inequalities we shall have 

(7) (Af. - »n,)fc« + moQ)" ‘ + 


This inequality shows that sequences 
Mo^ 

mo ^ mi ^ ms ^ • • • 


approach a common limit a. The following method can be used to find 
the value of this limit. I^et be an arbitrary sufficiently large integer 
and n the integer defined by 


Then 


and therefore 


n* ^ AT < (n -|- 1)*. 


mn 

l+X 


^ fnnix) 




Mn 

1 4-X 


mn 

l+X 


S Mx) 




Mn 

l+X 


The last inequality permits presenting/jv(x) thus: 


(8) Mx) = + eiM„ - m.); \6\ < 1, 

whence 

jr*/jv(x)da: = J^fa{x)dx = o log 2 + e'(Af, - m.), |fl'| < 1, 

and, because Mn — m» ultimately becomes as small as we please in 
absolute value, 

a log 2 - jr'/o(x)dx. 

Equation (8) shows clearly that the sequence of functions 
Mx),MxIMx), . . . 

defined by the recurrence relation (4) approaches uniformly the limit 
function 


a 


1 +x 
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where 

6. Solution of the Gaussian Problem. It suffices to apply the precede 
ing considerations to the case /o(x) = po(a:) =1. In this case Mo = 2, 
mo = 1, Mo = 0 and 

* log 2 

Consequently, 

“ (1 + x) log 2 (1 - A:) • 2"-«)’ ^ ^ 

where n = It suffices to integrate this expression between limits 

0 and ( < 1 to find 

“ ~Tog"^ (1 - )fe)2"-»)’ 

As N <x> 

Ps(f) -Hbp— 

as stated by Gauss. Moreover, 

p . . log (1 + 0| . , 3 \ 

nw -uTg 2- j < X (i~ifer2^'v 

for sufficiently large, but finite N, 



Table of the Pbobabiltxt Integral 


4= f 

i/^Jo 


M 

4 («) 

1.30 

0.4032 

1.31 

0.4049 

1.32 

0.4006 

1.33 

0.4082 

1.34 

0.4009 

1.36 

0.4116 

1.30 

0.4131 

1.37 

0.4147 

1.38 

0.4162 

1.30 

0.4177 

1.40 

0.4102 

1.41 

0.4207 

1.42 

0.4222 

1.43 

0.4236 

1.44 

0.4261 

1.46 

0.4266 

1.46 

0.4270 

1.47 

0.4202 

1.48 

0.4306 

1.49 

0.4319 

1.60 

0.4332 

1.61 

0>.4346 

1.62 

0.4367 

1.63 

0.4370 

1.64 

0.4382 

1.66 

0.4304 

1.66 

0.4406 

1.67 

0.4418 

1.68 

0.4420 

1.60 

0.4441 

1.60 

0.4462 

1.61 

0.4463 

1.62 

0.4474 

1.63 

0.4484 

1.64 j 

0.4406 

1.66 

0.4606 

1.66 i 

0.4616 

1.67 

0.4626 

1.68 

0.4635 

1.69 

0.4646 

1.70 

0.4664 

1.71 

0.4664 

1.72 

0.4673 

1.73 

0.4682 

1.74 

0.4691 

1.76 

0.4699 

1.76 

0.4608 

1.77 

0.4616 

1.78 

0.4626 

1.70 

0.4633 

1.80 

0.4641 

1.81 

0.4649 

1.82 

0.4666 

1.83 

0.4664 

1.84 

0.4671 

1.86 

0.4678 

1.86 

0.4686 

1.87 

0.4603 

1.88 

0.4609 

1.89 

0.4706 

1.00 

0.4713 

1.91 

0.4710 

1.92 

0.4726 

1.03 

0.4732 

1.94 

0.4738 


0.4766 
0.4761 
0.4767 
0.4772 
0.4783 
0.4793 
0.4803 
0.4812 
0.4821 
0.4830 
0.4838 
0.4846 
0.4864 
0.4861 
0.4868 
0.4876 
0.4881 
0.4887 
0.4803 
0.4808 
0.4004 
0.4000 
0.4013 
0.4018 
0.4022 
0.4027 
0.4031 
0.4034 
0.4038 
0.4041 
0.4046 
0.4048 
0.4061 
0.4063 
0.4066 
0.4060 
0.4061 
0.4063 
0.4066 
0.4067 
0.4069 
0.4071 
0.4073 
0.4074 
0.4976 
0.4977 
0.4070 
0.4080 
0.4081 
0.4082 
0.4984 
0 4086 
0.4986 
0.40866 
0.40031 
0.49900 
0.499841 
0.400028 
0.409008 
0.490097 
0.400007 
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INDEX 


A 

Arrangements, 18 

B 

Bayes’ formula (theorem), 61 
Bernoulli criterion, 5 
Bernoulli theorem, 96 
Bernoulli trials, 45 
Bernstein, S., inequality, 205 
Bertrand’s paradox, 251 
Buffon’s needle problem, 113, 251 
Barbier’s solution of, 253 

C 

Oantelli’s theorem, 101 
Cauchy’s distribution, 243, 275 
Characteristic function, composition of, 
275 

of distribution, 240, 264 
Coefficient, correlation, 339 
divergence, 212, 214, 216 
Combinations, 18 

Compound probability, theorem of, 31 
Continued fractions, 358, 361, 396 
Markoff’s method of, 52 
Continuous variables, 235 
Correlation, normal (sec Normal cor¬ 
relation) 

Correlation coefficient, distribution of, 
339 

D 

Difference equations, ordinary, 75, 78 
partial, 84 

Dispersion, definition, 172 
of sums, 173 

Distribution, Cauchy’s, 243, 275 
characteristic function of, 264 
of correlation coefficient, 339 


Distribution, determination of, 271 
equivalent point, 369 
general concept of, 263 
normal (Gaussian), 243 
Poisson’s, 279 
‘‘Student’s,” 339 

Distribution function of probability, 
239, 263 

Divergence coefficient, empirical, 212 
Lexis’ case, 214 
Poisson’s case, 214 
theoretical, 212 
Tschuprow’s theorem, 216 

E 

Elementary errors, hypothesis of, 296 
Ellipses of equal probability, 311, 328 
Estimation of error term, 295 
Euler’s summation formula, 177, 20L 
303, 347 

Events, compound, 29 
contingent, 3 
dependent, 33 
equally likely, 4, 5, 7 
exhaustive, 6 
future, 65 
incompatible, 37 
independent, 32, 33 
mutually exclusive, 6, 27 
opposite, 29 

Expectation, mathematical, 161 
of a product, 171 
of a sum, 165 

F 

Factorials, 349 
Fourier theorem, 241 
French lottery, 19, 108 
Frequency, 96 

Fundamental lemma (see Limit theorem) 
Fundamental theorem (see Tsheb 3 rsheff- 
Markoff theorem) 
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G 

Gaussian distribution, 243 
Gaussian problem, 306 
Generating function of probabilities, 
47, 78. 85, 89. 93, 94 

H 

Hermite polynomials, 72 
Hypothesis of elementary errors, 296 

I 

Independence, deEnition of, 32, 33 
K 

Khintchine (aee Law of large numbers) 
Kolmogoroff (see Law of large numbers; 
Strong law of large numbers) 

L 

Lagrange’s series, 84, 150 
Laplace-Liapounoff (see Limit theorem) 
Laplace’s problem, 255 
Laurent’s series, 87, 148 
Law of large numbers, generalisation 
by Markoff, 191 

for identical variables (Khintchine), 
195 

Kolmogoroff’s lemma, 201 
theorem, 185 
Tshebysheff’s lemma, 182 
Law of repeated logarithm, 204 
Law of succession, 69 
Lexis’ case, 214 

liapounoff condition (see Limit theorem! 
Liapounoff inequality, 265 
Limit theorem, Bernoullian case, 131 
for sums of independent vectors, 318,* 
323, 325, 326 
fundamental lemma, 284 
Laplace-Liapounoff, 284 
Line of regression, 314 
Lottery, French (see French lottery) 

M 

Bifarbe’s problem, 231 
Markoff’s theorem, infinite dispersion, 
191 


Markoff’s theorem, for simple chains, 301 
Markoff-Tshebysheff theorem (ace 
Tshebysheff-Markoff theorem) 
Mathematical expectation, definition of, 
161 

of a product, 171 
of a sum, 165 

Mathematical probability, definition of, 
6 

Moments, absolute, 240, 264 
inequalities for, 264 
method of (Markoff’s), 356jf. 

N 

Normal correlation, 313 
origin of, 327 

Normal distribution, Gaussian. 243 
two-dimensional, 308 

P 

Pearson’s *‘x*“tc8t,” 327 
Permutations, 18 
Point, of continuity, 261, 356 
of increase, 262, 356 
Poisson series, 182, 293 
Poisson’s case, 214 
Poisson’s distribution, 279 
Poisson’s formula, 137 
Poisson’s theorem, 208, 294 
Polynomials, Hermite (see Hermite) 
Probability, approximate evaluation oi 
by Markoff’s method, 52 
compound, 29, 31 
conditional, 33 
definition (classical) of, 6 
total, 27, 28 

Probability integral, 128 
table of, 407 

R 

Relative frequency, 96 
Runs, problem of, 77 

S 

Simple chains, 74, 223, 297 
Markoff’s theorem for, 301 
Standard deviation, 173 
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Stieltjes’ integrals, 261 
Stirling’s formula, 349 
Stochastic variables, 161 
Strong law of large numbers (Kolmo- 
goroff), 202 

‘^Student’s” distribution, 339 
T 

Table of probability integral, 407 
Tests of significance, 331 
Total probability, theorem of, 27, 28 
Trials, dependent, independent, repeated, 
44, 45 


Tschuprow {see Divergence coefficient) 
Tshebysheff-Markoff theorem, funda¬ 
mental, 304, 384 
application, 388 
Tshebysheff’s inequalities, 373 
Tshebysheff’s inequality, 204 
Tshebysheff's lemma, 182 
Tshebysheff’s problem, 199 

V 

Variables, continuous, 235 
independent, 171 
stochastic, 161 
Vectors (see Limit theorem) 





